Do emergent abilities result from genuine new capabilities or implicit in-context learning?

This explores whether the surprising 'emergent' jumps we see in large models are truly new capabilities appearing at scale, or whether they were latent in the model all along and merely surfaced by measurement choices, prompting, or light training.

This reads the question as asking whether emergence is *creation* or *elicitation* — and the corpus leans hard toward elicitation, with a few important cracks. The most direct challenge to the whole premise is that some 'emergence' isn't even real behavioral change: when you measure with continuous metrics instead of all-or-nothing scoring, the sharp unpredictable jumps smooth out into gradual, predictable improvement, suggesting emergence is partly an artifact of how we choose to grade the model rather than a switch flipping inside it Are LLM emergent abilities real or measurement artifacts?.

The second, deeper line is that the capabilities we call emergent are already sitting in the base model, waiting to be unlocked. Five independent techniques — RL steering, critique fine-tuning, decoding tweaks, feature steering, and RLVR — all surface reasoning that was already present in base-model activations, which reframes post-training as *selection* rather than acquisition Do base models already contain hidden reasoning ability?. Two adjacent papers sharpen this: thinking can be formalized as choosing among sub-policies the model already contains, needing rich initialization plus selection pressure rather than any new reasoning machinery Does thinking emerge when agents choose between learned sub-policies?; and RL post-training largely teaches *when* to reason, not *how* — hybrid models recover 91% of the gains just by routing tokens, and reasoning activation vectors exist before any RL touches the model Does RL post-training create reasoning or just deploy it?.

The 'implicit in-context learning' half of your question shows up most cleanly in work on in-context learning of sequential decisions: models generalize across wildly different tasks with no weight updates at all, but only when the context contains full or partial trajectories from the same environment — a structural property called trajectory burstiness Why do trajectories matter more than individual examples for in-context learning?. That's a concrete mechanism for how a fixed model can 'suddenly' do something new: the capability lives in the prompt's structure, not in newly learned weights. It pairs interestingly with evidence that post-training shifts a model from passive next-token prediction toward treating its own outputs as actions that shape its future inputs — a behavioral mode change rather than a new skill Do models recognize their own outputs as actions shaping future inputs?.

But the corpus does not let you conclude 'it's all elicitation.' The most careful answer is conditional: for standard reasoning tasks RL activates what's already latent, while for complex multi-step planning RL generates genuinely novel strategies that base models can't reach even with massive sampling Does reinforcement learning create new reasoning abilities or activate existing ones?. This dovetails with the observation that RL training moves through two phases — first consolidating execution, then opening up strategic exploration where planning becomes the bottleneck Does RL training follow a predictable two-phase learning sequence?. So 'genuine new capability' may be real precisely at the planning/coordination frontier, even if most headline emergence is elicitation or measurement.

The thing you might not have known to ask: the elicitation-vs-creation debate has a hard ceiling on the creation side. Agents trained only on static expert demonstrations are capped by what their curators imagined — they can't learn from their own failures because they never interact with anything during training Can agents learn beyond what their training data shows?. That suggests truly *new* capability tends to require interaction and selection pressure, not just scale — which is exactly why RL on hard planning tasks, not raw model size, is where the corpus locates the rare genuine novelty.

Sources 9 notes

Are LLM emergent abilities real or measurement artifacts?

Sharp, unpredictable capability transitions vanish when using continuous metrics instead of discontinuous ones. The same model outputs show smooth predictable improvement with scale, suggesting emergence is a measurement choice rather than a real behavioral change.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Does thinking emerge when agents choose between learned sub-policies?

Research formalizes thinking as selecting between sub-policies already contained in a policy function through a thought MDP framework. The key finding: thinking doesn't require new reasoning capabilities but rather rich policy initialization combined with RL-driven selection pressure.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Why do trajectories matter more than individual examples for in-context learning?

In-context learning for sequential decision-making requires full or partial trajectories from the same environment level, not just isolated examples. This structural property—trajectory burstiness—allows models to generalize across vastly different tasks without weight updates.

Do models recognize their own outputs as actions shaping future inputs?

Post-trained language models exhibit a measurable shift where they recognize their outputs become their own future inputs, closing an action-perception loop absent in pretraining. Evidence includes 3-4x lower output entropy on-policy and behavioral signatures of trajectory recognition.

Does reinforcement learning create new reasoning abilities or activate existing ones?

For standard reasoning tasks, RL activates latent abilities already present in base models. For complex planning requiring multi-step coordination, RL generates genuinely novel strategies inaccessible to base models even with extensive sampling.

Does RL training follow a predictable two-phase learning sequence?

Across eight models, RL training consistently shows a first phase where execution correctness drives learning, followed by a second phase where strategic planning becomes the bottleneck. Planning token entropy increases while execution entropy stabilizes, and concentration of optimization on planning tokens yields significant performance gains.

Can agents learn beyond what their training data shows?

Agents trained on static expert datasets cannot learn from their own failures or generalize beyond demonstrated scenarios because they never interact with environments during training. Competence is capped by what curators imagined, not by agent capacity.

Do emergent abilities result from genuine new capabilities or implicit in-context learning?

Sources 9 notes

Next inquiring lines