INQUIRING LINE

Does parameter isolation per task enable online updates without retraining?

This explores whether giving each task its own dedicated parameters — rather than overwriting shared weights — lets a model absorb new data or new tasks on the fly, without a full retrain.


This reads the question as: if you isolate parameters per task, can you update a deployed model live instead of retraining it? The corpus says yes — and the clearest case is streaming recommendation. DEGC assigns new parameters to capture emerging user preferences while leaving the parameters that encode older patterns untouched, which preserves the past *exactly* and gives you an explicit knob on the stability-vs-plasticity trade-off — something replay and distillation methods can't offer because they smear old and new together Can model isolation solve streaming recommendation better than replay?. The reason isolation works at all is structural: identifying each task's 'core' parameter regions, freezing them, and only merging the non-core ones consistently beats ordinary multi-task fine-tuning, and just scheduling tasks over time without that explicit structural separation isn't enough Can isolating task-specific parameters prevent multi-task fine-tuning interference?.

What's worth knowing is that 'isolate the parameters' is really one member of a larger family — *don't touch the shared weights at all*. VOYAGER skips weight updates entirely, storing new skills as executable entries in an external, searchable library and composing complex skills from simpler ones; it learns continuously precisely because nothing is being overwritten to forget Can agents learn new skills without forgetting old ones?. SoftCoT does the architectural version: freeze the main model, bolt on a small auxiliary module that does the adapting, and the pre-trained knowledge stays intact Can continuous reasoning avoid forgetting in instruction-tuned models?. Fast-Slow Training reframes the whole thing as an allocation problem — route fast-changing, task-specific lessons into optimized prompts and keep weight updates minimal, hitting the same performance 1.4–3x faster with far less forgetting Can splitting adaptation into two channels reduce forgetting?. The common thread: forgetting isn't an inherent cost of learning, it's what happens when new information is forced through the same parameters that hold the old.

The sharpest framing of 'online updates without retraining' comes from MetaClaw, which argues a single timescale isn't enough. Deployed agents need *both* fast skill injection from failures — seconds, zero downtime, no gradients — and slower gradient-based optimization during idle windows. The two reinforce each other: better policies surface more informative failures, and richer fast-learned skills produce higher-reward trajectories for the slow path to learn from Can agents adapt without pausing service to users?. So parameter isolation gets you the zero-downtime, retrain-free update — but it pairs naturally with a slower consolidation step rather than replacing it.

The thing you might not have known you wanted to know: across these notes, the mechanism that enables online updating is almost always *externalization* — pushing what changes out of the frozen weights and into a separate channel (new isolated parameters, a skill library, a prompt, an auxiliary model). Isolation per task is the structural way to do that inside the network; skill libraries and fast-context routing are the ways to do it outside the network. They're answers to the same question under different vocabulary.


Sources 6 notes

Can model isolation solve streaming recommendation better than replay?

DEGC uses per-task parameter isolation to handle streaming recommendation, providing explicit stability-plasticity trade-offs that experience replay and knowledge distillation methods cannot match. This approach preserves older patterns exactly while allowing new parameters to capture emerging preferences.

Can isolating task-specific parameters prevent multi-task fine-tuning interference?

Research shows that identifying core parameter regions per task, clustering overlapping tasks, and freezing core parameters while geometrically merging non-core parameters consistently outperforms standard multi-task fine-tuning. Temporal task scheduling alone proves insufficient without explicit structural parameter isolation.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can continuous reasoning avoid forgetting in instruction-tuned models?

SoftCoT avoids catastrophic forgetting by keeping the main LLM frozen while delegating soft thought generation to a small auxiliary model. This architectural separation maintains pre-trained knowledge while enabling continuous reasoning.

Can splitting adaptation into two channels reduce forgetting?

Fast-Slow Training routes task-specific lessons into optimized prompts while keeping parameter updates minimal, reaching equivalent performance 1.4–3x faster with substantially less catastrophic forgetting and plasticity loss, demonstrating that forgetting is a misallocation problem rather than an inherent cost.

Can agents adapt without pausing service to users?

MetaClaw demonstrates that deployed agents require both rapid skill injection from failures (seconds, zero downtime) and slower gradient-based optimization during idle windows (minutes to hours). The two mechanisms reinforce each other, with better policies producing more informative failures and richer skills enabling higher-reward trajectories.

Next inquiring lines