Does staying close to the base model preserve learning ability?

Explores whether limiting how far training pushes a model from its base distribution (measured by KL divergence) helps it learn new tasks more effectively over time, and why that trade-off matters for continual learning.

Synthesis note · 2026-05-28 · sourced from Training Fine Tuning

There is a quiet variable connecting forgetting, generalization, and the ability to keep learning: how far training pushes the policy from its base distribution, measured as KL divergence. The Fast-Slow result makes the relationship explicit. FST-trained models stay up to 70% closer to the base LLM in KL than parameter-only RL — and that reduced drift is not just a forgetting story. It preserves plasticity: after training on one task, FST models adapt more effectively to a subsequent task, while parameter-only RL stalls when task domains change on the fly.

The pattern is that drift and plasticity trade off. Each parameter update that improves in-domain reward also moves the model toward a sharper, lower-entropy policy specialized to that task. Specialization is exactly what makes the model less able to absorb the next task — the weights have committed. By keeping most task-specific adaptation in the fast textual channel and letting the slow weights move only a little, FST holds the policy near its flexible base, where it retains the entropy and breadth needed to learn again. Low KL drift is the leading indicator; preserved plasticity and reduced forgetting are downstream consequences.

Why it matters: it gives continual learning a measurable target. Rather than treating "don't forget" and "stay adaptable" as separate desiderata to engineer, you can watch a single quantity — distance from base — and recognize that overshooting it is what produces both forgetting and plasticity loss. It also reframes KL regularization (already standard in RLHF as a leash) as not merely a stability or alignment-preservation device but as the mechanism that keeps the model trainable in the future. The counterpoint: staying near base also caps how much any single task can specialize the weights, so for a one-shot deployment with no future tasks, aggressive drift may be the better trade.

Inquiring lines that use this note as a source 53

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 126 in 2-hop network ·medium cluster Open in graph ↗

Does staying close to the base model preserve le… Can splitting adaptation into two channels reduce … Can agents adapt without pausing service to users? Can agents learn continuously from experience with… Can frozen language models continually improve thr…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can splitting adaptation into two channels reduce forgetting? When language models adapt to new tasks, does separating task-specific learning (via prompt context) from persistent parameter updates help preserve both generalization ability and the model's original capabilities?
the architecture that achieves the low KL drift; this note isolates KL drift as the mechanism linking that architecture to preserved plasticity
Can agents adapt without pausing service to users? Can deployed LLM agents continuously improve their capabilities while serving users without interruption? This explores whether fast behavioral updates and slow policy learning can coexist across different timescales.
continual-learning design that likewise minimizes disruptive weight movement by routing fast adaptation elsewhere
Can agents learn continuously from experience without updating weights? This explores whether LLM agents can adapt to new tasks and failures by retrieving past experiences from memory alone, rather than requiring expensive parameter fine-tuning or rigid hardcoded rules.
the limiting case: zero weight drift via external memory, trading parametric plasticity preservation for a retrieval-based store
Can frozen language models continually improve through memory structure alone? If agents can't update parameters, what form of textual memory lets them keep learning across trials and transfer to new tasks without retraining?
frozen-weight continual improvement (KL drift exactly zero), the extreme end of the drift-versus-plasticity spectrum this note describes

Does staying close to the base model preserve learning ability?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4