How should agent memory split across time scales?
Explores whether agent working memory should be organized by temporal scope—some components persisting across a conversation, others refreshed each turn. Understanding this distinction could reveal why some memory designs fail.
Most agent architectures describe their memory as one undifferentiated working buffer plus an external store. RAISE (2401.02777) refines the working layer into four components — but the contribution that gets missed is not the four components, it is the two granularities underneath them.
The four components: system prompt (role identity, objectives, tool descriptions, few-shot anchors), context (conversation history plus task trajectory), scratchpad (background information, intermediate reasoning, observations from tool calls), examples (query-response pairs retrieved for the current task to supplement knowledge gaps).
The granularity split is the under-noticed structural claim. Conversation history and scratchpad are dialogue-level: they accumulate across the entire conversation and persist between turns. Examples and task trajectory are turn-level: they are recalled and replaced each turn based on the current query. The four components form a 2×2 design space: dialogue-vs-turn × continuous-accumulation-vs-retrieval-replacement.
The granularity distinction matters because it predicts which components introduce certain failure modes. Dialogue-level components grow monotonically and trigger context-length pressure; they need pruning policies. Turn-level components risk recall failure if the retrieval index is stale or the retrieval signal is weak; they need refresh policies. Treating all working memory as one buffer makes both problems invisible. RAISE makes them addressable as separate concerns.
The update protocol shows the granularity in action. On each turn: (1) append the user query to conversation history (dialogue-level append), (2) recall top-k relevant examples from a separate example pool via vector retrieval (turn-level replace), (3) update current entity information in the scratchpad if applicable (dialogue-level update), (4) update agent trajectory and tool results in task memory during execution (turn-level append-within-turn). Different components, different update rules, different lifecycles — all coordinated by the controller.
The implication for agent design: the question "where does this go in memory?" decomposes into two sub-questions — what is its temporal scope, and what is its update policy? Architectures that conflate these end up with either bloated dialogue buffers (everything is dialogue-level append) or lossy turn-level memory (everything is replaced each turn).
Inquiring lines that use this note as a source 45
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can persistent memory and identity files alone create genuine agent socialization?
- How should GUI agents remember patterns across different software environments?
- Can environmental scaffolding replace internal memory scaling in agent design?
- Could a single agent system switch memory granularity between tasks?
- Why do CoALA and Letta disagree on what counts as working memory?
- How does spatial density in web UIs break workflow-level memory?
- Should agents update memory after every turn or batch process sessions?
- Why do different agent memory architectures make incompatible granularity claims?
- How should memory consolidation timing differ across multiple timescales?
- How do time gaps between conversations change what chatbots should remember?
- What accounts for performance drops in multi-turn agent interactions?
- Why does GUI agent memory need different abstraction levels?
- Do agents prefer raw experience over condensed summaries of past actions?
- How does AI's inability to sustain temporal attention limit its capacity for expert roles?
- How do biological brains organize computation across different cortical timescales?
- What makes memory trajectories topologically stable under persistent reuse?
- How do insert, forget, and merge operations maintain thought coherence over time?
- Can episodic memory of UI traces improve open-world agent adaptation?
- What makes a memory reachable in the right context?
- What distinguishes formation, evolution, and retrieval as separate memory dynamics?
- What interaction mechanisms let humans and agents defer work effectively?
- How do token, parametric, and latent memory forms coexist in single agents?
- Which memory components trigger context-length problems in agents?
- What update rules should govern dialogue-scoped versus turn-scoped memory?
- Can pruning policies alone solve working memory bloat in agents?
- How do agents decide which created code should persist versus disappear?
- How does workflow abstraction compare to state-indexed procedural memory for web agents?
- What is the right granularity level for agent memory to enable both reuse and composition?
- Can agent-controlled memory management outperform fixed consolidation schedules?
- Does workflow-level memory or state-action memory better capture reusable agent knowledge?
- Why do continuously consolidated agent memories eventually degrade below no-memory baseline?
- Can relationship dynamics between user and agent be tracked as distinct memory?
- How does memory folding enable agents to reconsider strategies mid-task?
- What happens when governance rules exist in memory but fail to surface during critical actions?
- How do the three-axis taxonomies of memory forms and functions differ?
- What distinguishes working memory from strategic memory in agent task execution?
- How do fast and slow timescales enable continual agent adaptation?
- What specific failure modes emerge when agents retrieve stale or contaminated memories?
- What properties of agent systems only become visible across multiple sessions?
- How does durable memory quality shape agent performance over time?
- How should memory systems split between short-term and long-term storage?
- Can the same compress-then-act pattern work for agent state memory?
- How do adaptive memory modules compare to feedback-based working memory for long context?
- What separates artifact recall from persistent memory commitment in agents?
- How should agents compress episodic interactions into working memory without accumulation?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
How should agents decide what memories to keep?
Agent memory management splits between agents autonomously recognizing important information versus programmatic triggers. Understanding this choice reveals why different memory architectures prioritize different information types.
Letta's hot/cold path is about *who triggers updates* (agent vs system); RAISE's two granularities are about *what temporal scope is updated* — orthogonal design axes
-
Can three axes replace the short-term long-term memory split?
Does breaking agent memory into forms, functions, and dynamics provide a clearer framework than the traditional short-term/long-term distinction? This matters because current agent-memory literature lacks a unified vocabulary, making comparison between systems nearly impossible.
RAISE's components occupy specific positions along the functions axis: scratchpad is working, examples are factual, conversation history is experiential
-
Can a single model replace retrieval for long-term conversation memory?
COMEDY proposes collapsing the standard retrieval pipeline into one unified model that generates, compresses, and responds. But does eliminating the retriever actually improve performance, or does compression lose critical information?
COMEDY collapses these distinctions by merging everything into one compressive store; RAISE preserves them
-
Can interleaving reasoning with real-world feedback prevent hallucination?
Does grounding language model reasoning in external world observations rather than internal associations help prevent error propagation and false outputs? This explores whether breaking the static chain-of-thought pattern can catch and correct mistakes in real time.
RAISE is a ReAct enhancement; the four-component memory is what makes the reasoning-action loop trackable across long dialogues
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Useful Memories Become Faulty When Continuously Updated by LLMs
- Memory in the Age of AI Agents: A Survey — Forms, Functions and Dynamics
- Memory Sandbox: Transparent and Interactive Memory Management for Conversational Agents
- Rethinking Memory as Continuously Evolving Connectivity
- From Model Scaling to System Scaling: Scaling the Harness in Agentic AI
- A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
- ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
- The AI Hippocampus: How Far are We From Human Memory?
Original note title
agent working memory decomposes into four components at two granularities — dialogue-level history and scratchpad versus turn-level examples and task trajectory