Can AI models retain knowledge across changing environments without catastrophic forgetting?

This explores whether AI agents can keep what they've learned as their environment shifts — and the corpus answers by reframing the problem: forgetting is largely a side effect of updating model weights, so most solutions sidestep weights entirely.

This explores whether AI agents can keep what they've learned as their environment shifts — and the most striking thing in the corpus is that nearly everyone has stopped trying to solve catastrophic forgetting inside the model's weights at all. The dominant move is to put memory *outside* the parameters. VOYAGER stores learned behaviors as executable code in a searchable skill library and composes new skills from old ones, so learning accumulates instead of overwriting Can agents learn new skills without forgetting old ones?. AgentFly goes further and formalizes the whole learning loop as memory operations — improving its policy and hitting 87.88% on a hard benchmark without touching a single weight Can agents learn continuously from experience without updating weights?. Reflexion does the simplest version: after each success or failure the agent writes itself a short note and stores it, learning across episodes with no gradient updates at all Can agents learn from failure without updating their weights?.

The sharpest reframe comes from work arguing that forgetting isn't an inherent cost — it's a misallocation. Fast-Slow Training splits adaptation into two channels: durable, slow-changing weights stay mostly frozen while task-specific lessons get routed into fast, editable text prompts. The result is equivalent performance reached faster, with much less forgetting and less loss of plasticity Can splitting adaptation into two channels reduce forgetting?. That two-speed idea echoes how human memory separates stable skills from fresh facts, and it suggests the right question isn't 'how do we stop weights from being overwritten' but 'what should ever have been written to weights in the first place.'

But here's the twist you might not expect: it's not just *whether* you externalize memory, it's the *shape* of what you store. Agents using 'causal' memory — notes that record not just what worked but the conditions under which it worked — beat generic reflection by 23 points on repeated trials and, crucially, gained 4–17 points when transferred to entirely new environments Can frozen language models continually improve through memory structure alone?. So a frozen model can not only retain knowledge across a changing environment but actively transfer it, purely through better-structured memory. The ACE framework makes a related point from the failure side: treat your accumulated knowledge as an evolving playbook updated incrementally, not rewritten wholesale, or you get 'context collapse' where compression silently erodes the details you needed Can context playbooks prevent knowledge loss during iteration?.

There's a real cost to all this external memory, though — it bloats. DeepAgent's autonomous memory folding compresses interaction history into structured episodic, working, and tool schemas, cutting token overhead while preserving the ability to reflect Can agents compress their own memory without losing critical details?. The harder finding under that: autonomous systems are bad at reconciling *contradictory* knowledge as the world changes. ARIA shows test-time learning needs timestamped knowledge bases to detect conflicts, but the actual resolution of which rule now applies often requires a human, because the right answer depends on context outside the system Can LLMs learn reliably at test time without human oversight?.

Step back and the corpus is making one unified claim: reliable agents stay reliable across change by externalizing their cognitive burdens — memory, skills, and protocols — into a surrounding 'harness' rather than relying on the model to re-solve everything inside its weights Where does agent reliability actually come from?. So the answer to your question is yes — but the path runs almost entirely *around* the model rather than through it. Catastrophic forgetting turns out to be mostly a problem you inherit only if you insist on cramming every lesson into the parameters.

Sources 9 notes

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can agents learn continuously from experience without updating weights?

AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Can splitting adaptation into two channels reduce forgetting?

Fast-Slow Training routes task-specific lessons into optimized prompts while keeping parameter updates minimal, reaching equivalent performance 1.4–3x faster with substantially less catastrophic forgetting and plasticity loss, demonstrating that forgetting is a misallocation problem rather than an inherent cost.

Can frozen language models continually improve through memory structure alone?

Agents using causal-form memory (preserving applicability conditions) outperform generic reflection by 23 points on repeated trials and gain 4-17 points transferring to new environments, showing memory shape matters more than parameter updates.

Can context playbooks prevent knowledge loss during iteration?

The ACE framework treats contexts as evolving playbooks using generation-reflection-curation loops rather than full rewrites. This prevents knowledge loss from compression and detail erosion, achieving +10.6% on agentic tasks and +8.6% on finance without labeled supervision.

Can agents compress their own memory without losing critical details?

DeepAgent's autonomous memory folding consolidates interaction history into episodic, working, and tool memory schemas. This reduces token overhead while letting agents pause to reconsider strategies—the autonomy and structure together avoid degradation that plagues poorly designed consolidation.

Can LLMs learn reliably at test time without human oversight?

ARIA demonstrates that LLMs can adapt during inference through three integrated components: structured self-dialogue for uncertainty assessment, timestamped knowledge bases for conflict detection, and human-mediated resolution queries. Autonomous systems fail at reconciling contradictory rules because the correct choice depends on context outside the system.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

Can AI models retain knowledge across changing environments without catastrophic forgetting?

Sources 9 notes

Next inquiring lines