Why do trajectories matter more than individual examples for in-context learning?
Can language models learn new sequential decision-making tasks from context alone, and if so, what data properties make this possible? This explores why isolated state-action pairs fail where full trajectories succeed.
In-context learning for supervised tasks works by providing a few input-output examples. Naively applying this to sequential decision making (providing a few state-action pairs) fails to enable ICL of new tasks. The key finding: the context must contain full or partial trajectories from the same environment level as the query — not just isolated examples. This property is called trajectory burstiness.
Why the difference matters: In supervised learning, examples can be from different instances — the model learns the function mapping. In sequential decision making, the model must generalize from the same level/environment to handle the wide range of states it may encounter at deployment. A sparse set of state-action pairs doesn't cover the state space; full trajectories do.
Trajectory burstiness is the probability that a given input sequence contains at least two trajectories from the same level. When this property is present in pre-training data, the model acquires the capacity to learn new tasks from demonstrations at inference time without weight updates.
Additional factors that increase ICL performance:
- Larger model and dataset size
- More task diversity in pre-training
- Environment stochasticity (forces generalization over trajectory variation)
- Higher trajectory burstiness in pre-training data
Generalization scope demonstrated: Train/test tasks differ greatly — different states, actions, dynamics, and reward functions. The model generalizes from, e.g., platform games to maze navigation from a handful of expert demonstrations. This is substantially harder than prior work that generalizes across reward function variants of the same environment.
The implication for dataset construction: sequential decision-making ICL requires a data distribution property (trajectory burstiness) that standard language modeling data does not naturally contain. This is a data structural requirement, not just a scale requirement.
This connects to Does training data format shape reasoning strategy more than domain? — here the structural property is at the trajectory level rather than the reasoning step level, but the principle is the same: data structure determines capability.
Inquiring lines that use this note as a source 42
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why does removing language from its context destroy what makes it work?
- How does in-context learning trigger phase transitions in model behavior?
- Does task superposition explain how models learn from multiple in-context trajectories?
- How do training objectives shape what a world model actually learns?
- When does natural context diversity reduce the need for explicit exploration?
- How do neural networks extend contextual bandits beyond linear reward assumptions?
- Why do context-sensitive languages transfer better than regular or context-free languages?
- Can in-context learning replicate the timing effects that RL teaches models?
- What makes session-aware multi-turn tracking necessary for asynchronous training?
- What role does sequence model in-context learning play in multi-agent cooperation?
- Do emergent abilities result from genuine new capabilities or implicit in-context learning?
- Can in-context learning substitute for domain-specific training altogether?
- How does explicit exploratory prompting compare to fine-tuned reinforcement learning for in-context adaptation?
- Can a single model trained on two tasks predict untrained decision tasks?
- Can episodic memory alone enable learning without parameter updates?
- How do retrieved memories differ from decision-context passages for prediction?
- What temporal signals in screen recordings matter most for task understanding?
- How does credit assignment work across many sequential decision steps in language models?
- How does trajectory filtering handle noise when language models use code execution tools?
- What role does a model's representational structure play in learning?
- Why do longer context windows alone fail to capture temporal dynamics in dialogue?
- Can trajectory quality filtering improve model training in noisy environments?
- How do chunk-based step segmentation and trajectory structure modeling differ?
- What computational cost does trajectory-bursty inference impose on per-query context requirements?
- Does environment stochasticity force models to generalize better across trajectory variations?
- Can activation sparsity patterns guide the selection of in-context learning demonstrations?
- How do complete multi-turn trajectories differ from isolated task examples?
- Can trajectory structure alone provide process supervision without human annotation?
- Can agents compress long trajectories without losing critical decision context?
- Does input surprise drive the implicit recognition of on-policy context?
- Why does the order of training examples matter for what models learn?
- What data properties enable transformers to learn sequential decision-making in context?
- Can in-context reinforcement learning match human sample efficiency on real problems?
- What makes some contexts learnable as rules versus requiring model retraining?
- How does evaluating interaction trajectories change what we measure beyond correctness?
- Can graph topology represent successful trajectory clusters more effectively than skill libraries?
- Do text-space skills transfer learning across different frontier models?
- Do few-shot examples improve in-context learning or add noise?
- What makes a good in-context learning example for a given task?
- What makes trajectory quality matter more than one-shot task success?
- How does training order affect knowledge acquisition in language models?
- Should user context live in tokens or in learned model representations?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does training data format shape reasoning strategy more than domain?
What explains why models trained on multiple-choice data reason differently than those trained on free-form text? The research isolates format and domain effects to measure which one matters more.
trajectory burstiness is another case where data structure determines emergent capability
-
What do models actually learn from chain-of-thought training?
When models train on reasoning demonstrations, do they memorize content details or absorb reasoning structure? Testing with corrupted data reveals which aspects of CoT samples actually drive learning.
structural properties of training data drive learning; applies at both the reasoning trace and trajectory levels
-
Can we allocate inference compute based on prompt difficulty?
Does adjusting how much compute each prompt receives—rather than using a fixed budget—improve model performance? Could smarter allocation let smaller models compete with larger ones?
the context-length requirements for trajectory-bursty inference raise per-query compute costs
-
Can LLMs handle multiple tasks at once during inference?
Do language models maintain multiple distinct in-context learning tasks simultaneously in their internal representations, and if so, what prevents them from actually generating outputs for more than one task?
task superposition may be the representational mechanism enabling trajectory-bursty ICL: the model maintains multiple task interpretations from in-context trajectories simultaneously before committing to a single policy at generation time
-
Can transformers learn to solve new problems within episodes?
Explores whether transformer models can develop meta-learning abilities through RL training, enabling them to adapt to unseen environments by learning from within-episode experience alone, without updating weights.
ICRL is the RL-trained capability that trajectory burstiness enables: same-level trajectories create the meta-learning pressure during training that ICRL exploits at inference time for adaptation to unseen environments
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Generalization to New Sequential Decision Making Tasks with In-Context Learning
- Supervised Pretraining Can Learn In-Context Reinforcement Learning
- Teaching Large Language Models to Reason with Reinforcement Learning
- Schema-learning and rebinding as mechanisms of in-context learning and emergence
- Training a Generally Curious Agent
- In-Context Principle Learning from Mistakes
- In-context learning agents are asymmetric belief updaters
- Learning To Retrieve Prompts for In-Context Learning
Original note title
trajectory burstiness — same-level trajectories in context — is required for in-context learning of sequential decision-making across new tasks