How do the three-axis taxonomies of memory forms and functions differ?
This explores the different ways recent work carves agent memory into a small number of orthogonal axes — and how those carvings disagree about what the 'three' should even be.
This reads the question as: when researchers say memory has three axes, they don't all mean the same three — so what's actually being distinguished? The corpus has at least three competing schemes, and the interesting part is that they slice along different planes.
The most explicit one comes from a 2025 survey that proposes forms (token, parametric, latent), functions (factual, experiential, working), and dynamics (formation, evolution, retrieval) Can three axes replace the short-term long-term memory split?. Its argument is that the familiar short-term/long-term split isn't a real architectural axis at all — it's an emergent pattern that falls out of the dynamics axis (how memory forms and decays over time). So 'forms' is about *where* memory lives, 'functions' is about *what it's for*, and 'dynamics' is about *how it changes*. These are meant to be genuinely orthogonal: any real system is a point in all three at once.
A different three-way cut maps memory onto the brain rather than onto abstract properties: transformer weights as a consolidated neocortex, retrieval/RAG stores as hippocampal rapid encoding, and agentic state as prefrontal executive control Can brain memory systems explain how LLMs should store knowledge?. Notice this overlaps the survey's 'forms' axis (weights ≈ parametric, retrieval ≈ token) but smuggles in function too — the prefrontal/agentic tier is defined by what it does, not where it sits. So this taxonomy collapses two of the survey's supposedly-independent axes into one biological story, which is exactly the kind of conflation the forms/functions/dynamics framing is trying to pull apart.
Then there are taxonomies that aren't three-axis at all but get mistaken for siblings. RAISE decomposes agent working memory into four components across two granularities — dialogue-level (conversation history, scratchpad) vs. turn-level (examples, task trajectory) How should agent memory split across time scales?. That's a 2×2, and it lives *entirely inside* the survey's 'working' function — so it's not a rival taxonomy, it's a zoom-in on one cell. Similarly, the STIM work splits chain-of-thought memorization into local, mid-range, and long-range sources Where do memorization errors arise in chain-of-thought reasoning? — a three-way cut, but along *distance*, a single dimension, not three orthogonal ones.
The payoff of seeing these side by side: a 'three-axis taxonomy' can mean three independent dimensions (forms/functions/dynamics), three instances along one dimension (memorization by distance), or a brain analogy that quietly bundles dimensions together. The reason this matters is practical — the survey's whole claim is that you can only compare two memory systems precisely if your axes are actually orthogonal, and that memory structure, not parameter count, is now the live scaling frontier Has memory architecture replaced parameter count as the scaling frontier?. Taxonomies that conflate where/what/how make that comparison impossible, which is the real difference between them.
Sources 5 notes
A 2025 survey reframes agent memory along forms (token/parametric/latent), functions (factual/experiential/working), and dynamics (formation/evolution/retrieval), showing that short/long-term phenomena emerge from temporal patterns rather than architectural separation. This enables precise system comparison and replaces vague implementation-based claims.
Research shows transformer weights function as a distributed neocortex for consolidated knowledge, RAG stores as hippocampal indexing for rapid encoding, and agentic state as prefrontal executive control. The CLS framework predicts why hybrid systems outperform single-tier approaches and identifies missing consolidation mechanisms that prevent memory integration.
RAISE shows that agent memory consists of four components organized by two design axes: dialogue-level (conversation history, scratchpad) versus turn-level (examples, task trajectory). This granularity distinction predicts different failure modes and update policies for each component.
STIM framework identifies local, mid-range, and long-range memorization sources in CoT reasoning. Local memorization—based on preceding tokens—accounts for up to 67% of reasoning errors, especially as complexity increases and distributional shift occurs.
Three converging signals in late-2025 research—taxonomy maturation, memory-aware test-time scaling loops, and hybrid sparsity laws—show that returns from restructuring memory now exceed returns from adding parameters. The design bottleneck has shifted from compute to memory structure.