How do transformers stitch together learned behaviors when adapting to new tasks?

This explores how transformers recombine skills they've already learned when facing a new task — whether they genuinely compose pieces or just stitch together memorized fragments, and where that stitching happens (in the weights, at inference, or in an external memory).

This explores how transformers recombine skills they've already learned when facing a new task. The corpus splits into a revealing argument: when a transformer looks like it's composing learned behaviors, is it actually composing — or just pattern-matching against fragments it memorized? One sharp line of work says the latter. Transformers often succeed on familiar combinations by compositional-reasoning-in-transformers-reduces-to-linearized-subgraph-matching|memorizing computation subgraphs and replaying them, which works in-distribution but shatters on genuinely novel compositions, with errors compounding step by step. So the default way a transformer 'stitches' is closer to retrieval-of-known-patterns than to building something new from parts.

But the corpus also shows the stitching getting more real under specific pressures. implicit-multi-hop-reasoning-in-transformers-emerges-through-three-developmental|Multi-hop reasoning emerges in three developmental stages — memorization, then in-distribution generalization, then cross-distribution reasoning — and the same three-phase 'grokking' shape appears when recurrent-depth-transformers-achieve-compositional-generalization-over-parametri|looped transformers reuse shared parameters across iterations to reach combinations a vanilla model can't. The interesting twist: genuine compositional generalization isn't free. The multi-hop work finds that second-hop generalization requires *explicit* exposure to composed examples during training — the model won't invent the bridge on its own.

The most direct answers to 'how does the stitching happen' come from work that moves composition out of the frozen weights. self-adaptive-llms-compose-expert-vectors-at-inference-via-two-pass-singular-val|Transformer² tunes only the singular values of weight matrices to make composable expert vectors that mix at inference without interfering with each other — so adapting to a new task becomes a dynamic blend of specialists rather than a retrain. compositional-skill-libraries-that-compound-through-synthesis-enable-lifelong-le|VOYAGER goes further and externalizes the skills entirely, storing executable routines in a searchable library and building complex skills from simpler ones — which sidesteps the catastrophic forgetting that weight-update methods suffer. verbal-reflection-stored-as-episodic-memory-lets-agents-learn-from-trial-and-err|Reflexion stitches across attempts by writing verbal self-diagnoses into episodic memory, improving without touching parameters at all. The pattern across all three: the most robust 'stitching' often happens outside the weights.

There's also a quieter finding that reframes what 'adapting' even means. instruction-tuning-teaches-output-format-distribution-not-task-understanding-sim|Instruction tuning may teach the shape of the output more than the task itself — models trained on semantically empty or wrong instructions perform nearly as well — which suggests some apparent task-adaptation is the model learning *where in output space to land*, not assembling new reasoning. Meanwhile in-context-reinforcement-learning-enables-transformers-to-meta-learn-from-episod|RL-finetuned transformers develop in-context reinforcement learning and adapt within an episode's context window with no weight updates, and self-improving-transformers-achieve-extreme-length-generalization-through-iterat|self-training on their own filtered-correct solutions lets them extend learned behavior far past the training range.

The thing worth taking away: 'stitching learned behaviors' isn't one mechanism. It runs along a spectrum from brittle subgraph-replay inside the weights, to architecturally-forced composition, to skills deliberately externalized into libraries and memory — and one foundational note explains why the in-weights version is so fragile: transformers hold transformer-residual-streams-transmit-knowledge-as-flow-not-storage-closer-to-or|knowledge as flowing activations rather than retrievable storage, so behaviors are entangled with the act of generation, not cleanly stored parts waiting to be snapped together.

Sources 10 notes

Do transformers actually learn systematic compositional reasoning?

Research shows transformers succeed on in-distribution tasks by memorizing computation subgraphs from training data, not by learning systematic rules. They fail drastically on novel compositions, with errors compounding across reasoning steps.

How do transformers learn to reason across multiple steps?

Controlled training reveals transformers learn multi-hop reasoning in three phases: memorization, in-distribution generalization, and cross-distribution reasoning. Successful reasoning correlates with cosine clustering of entity representations, and second-hop generalization requires explicit compositional exposure during training.

Can looped transformers generalize to unseen knowledge combinations?

Recurrent-depth transformers with shared parameters across iterations enable systematic generalization and depth extrapolation that vanilla transformers cannot achieve. This emerges through a sharp three-phase process: memorization, in-distribution, then out-of-distribution generalization.

Can models dynamically activate expert skills at inference time?

Transformer2 demonstrates that tuning only singular values within weight matrices produces composable expert vectors that dynamically mix at inference without interference, outperforming LoRA with fewer parameters and enabling continual specialization.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Does instruction tuning teach task understanding or output format?

Models trained on semantically empty or deliberately incorrect instructions achieve comparable performance to those trained on full correct instructions, achieving 43% vs random baseline 42.6%. The semantic content of instructions appears largely irrelevant; what transfers is knowledge of the output space.

Can transformers learn to solve new problems within episodes?

Llama 3.1 8B fine-tuned with RL exhibits emergent in-context reinforcement learning, solving unseen problems through within-episode adaptation at human-level sample efficiency. This meta-learning emerges from RL's training pressure combined with the transformer's context window, without weight updates.

Can transformers improve exponentially by learning from their own correct solutions?

Standard transformers generalize from 10-digit to 100-digit addition by repeatedly generating solutions, filtering for correctness, and retraining—showing exponential (not linear) out-of-distribution improvement across rounds without saturation.

Do transformer models store knowledge or generate it continuously?

Transformers organize knowledge as flowing activations rather than retrievable archives, mirroring oral cultures where knowledge exists only in performance. This explains why model knowledge is contextual, difficult to edit, and inseparable from generation.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

As a researcher stress-testing claims about transformer composition, is genuine task adaptation in transformers built from reusable learned parts — or is apparent composition mostly pattern replay and in-context inference?

What a curated library found — and when (dated claims, not current truth):
Findings span May 2023–April 2026. A library of work on transformer composition reports:
- Transformers often succeed by memorizing and replaying known subgraph patterns rather than composing genuinely novel behaviors; errors compound in out-of-distribution compositions (2023–2024).
- Multi-hop reasoning emerges in three phases (memorization → in-distribution generalization → cross-distribution), but second-hop generalization requires explicit composed examples during training—the model won't bridge on its own (2025).
- Weight-external approaches (singular-value tuning, skill libraries, episodic memory) achieve more robust composition than in-weights stitching, which is fragile because transformers hold knowledge as flowing activations, not stored, snappable parts (2025–2026).
- Instruction tuning may teach output-distribution shape, not task understanding; in-context RL and self-training (filtering correct solutions) enable adaptation within context windows with zero weight updates (2023–2025).

Anchor papers (verify; mind their dates):
- 2305.18654 (Faith and Fate, May 2023): compositionality limits
- 2305.11383 (Instruction Tuning empirics, May 2023): output format, not task semantics
- 2501.14176 (RL + Transformer, Jan 2025): in-context meta-learning
- 2604.07822 (Recurrent-Depth, April 2026): looped reuse and true compositional generalization

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models, scaling, architectural innovations (mixture-of-experts, structured state), training recipes (curriculum, synthetic composition), or post-hoc skill libraries have since RELAXED or OVERTURNED it. Separate the durable question (likely: do transformers truly compose or replay?) from perishable limitations (e.g., "instruction tuning teaches only format"). Cite what resolved it; plainly name what still holds.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months (post-Feb 2025) that directly disputes the subgraph-replay thesis or shows genuine in-weights composition under realistic conditions.
(3) Propose 2 research questions that ASSUME the composition regime has evolved—e.g., "Under what scale/architecture do weight-external skill composition and in-weights composition become equivalent?" or "Do multi-agent orchestration and tool-use effectively bypass the need for compositional weights altogether?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

How do transformers stitch together learned behaviors when adapting to new tasks?

Sources 10 notes

Next inquiring lines