How do retrieved memories differ from decision-context passages for prediction?
This explores a distinction the corpus keeps circling: a passage retrieved as *memory* (a stored fact or past episode you pull in to recall something) plays a different role in prediction than a passage that supplies *decision context* (the in-flight sequence of states and actions a model reasons over to decide what to do next).
This explores a distinction the corpus keeps circling: retrieved memory and decision context aren't the same kind of input, and they help prediction in different ways. The sharpest framing comes from work on what pretraining documents actually do for a model. There, factual recall and reasoning draw on opposite kinds of material — answering a fact relies on narrow, document-specific memorization (the passage that literally contains the answer), while reasoning generalizes from broad, transferable *procedural* knowledge spread across many unrelated sources Does procedural knowledge drive reasoning more than factual retrieval?. So a retrieved memory tends to be valuable for *what it says*, whereas decision context is valuable for *the pattern of moves it demonstrates*.
That second mode — context as a worked example of how to act — is exactly what the in-context learning work pins down. Isolated examples don't unlock sequential decision-making; the model needs full or partial *trajectories* from the same environment, because the prediction it's making is about the next action in a sequence, and only a trajectory carries that structure Why do trajectories matter more than individual examples for in-context learning?. A retrieved fact has no trajectory shape to it. A decision-context passage is almost nothing *but* trajectory shape.
The corpus also shows the two can be different physical channels inside a system. The Titans architecture literally splits them: attention handles the short-term decision context in the window, while a separate neural memory module stores compressed long-term material, prioritizing surprising tokens for later recall Can neural memory modules scale language models beyond attention limits?. And the choice of *which* to lean on is itself learnable — DeepRAG frames each reasoning step as a decision about whether to retrieve external knowledge or rely on what's already in context, gaining accuracy mainly by *not* retrieving when the decision context already suffices When should language models retrieve external knowledge versus use internal knowledge?.
Here's the part you might not expect: memories worth retrieving may need to be *processed differently depending on their type* before they're useful for prediction. SkillRL keeps successful episodes as concrete demonstrations but abstracts failures into general lessons — treating the two as interchangeable degrades performance Should successful and failed episodes be processed differently?. Reflexion makes the same point from the memory side: it stores verbal self-diagnoses as episodic memory and deliberately keeps them *uncompressed*, because compressing them destroys the specificity that makes them actionable next time Can agents learn from failure without updating their weights?. Compression is where retrieved memory gets fragile — single-model memory consolidation can actually drop *below* a no-memory baseline when it over-compresses and misgroups Can a single model replace retrieval for long-term conversation memory?.
The through-line: decision context predicts by supplying live structure the model reasons over directly, so it rewards completeness and sequence. Retrieved memory predicts by supplying recalled content the model has to re-integrate, so it rewards the right *form* — concrete when you need to copy a move, abstracted when you need a principle. One framework even inverts the usual direction entirely, casting cognition as navigation backward over stored inference paths rather than forward reward-seeking, which suggests memory isn't just a lookup table feeding prediction but the substrate the prediction runs on Can cognition work by reusing memory instead of recomputing?.
Sources 8 notes
Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.
In-context learning for sequential decision-making requires full or partial trajectories from the same environment level, not just isolated examples. This structural property—trajectory burstiness—allows models to generalize across vastly different tasks without weight updates.
Titans architecture separates attention (short-term, quadratic) from neural memory (long-term, compressed), prioritizing surprising tokens for storage. The model outperforms standard Transformers and linear RNNs across tasks while scaling to 2M+ token contexts without quadratic penalties.
DeepRAG models each reasoning step as a Markov Decision Process where the model learns when to retrieve versus rely on parametric knowledge. The 21.99% improvement comes from better-targeted retrieval and elimination of noise from unnecessary external knowledge.
SkillRL demonstrates that treating successful episodes as concrete demonstrations and failures as abstracted lessons achieves state-of-the-art performance on complex tasks while using substantially less context than uniform approaches. The asymmetry mirrors human expert reasoning and avoids the degradation seen in uniform consolidation methods.
Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.
COMEDY merges memory generation, compression, and response into one operation, tracking event recaps, user portraits, and relationship dynamics without vector-DB retrieval. However, empirical work shows continuous reprocessing follows an inverted-U curve, degrading below no-memory baseline due to misgrouping, context loss, and overfitting.
Memory-Amortized Inference proposes intelligence arises from structured reuse of prior inference paths over topological memory, inverting RL's reward-forward logic into cause-backward reconstruction. This duality explains energy efficiency and suggests memory trajectories form the substrate of adaptive thought.