Do reflection tokens carry more information about correct answers?

Explores whether tokens expressing reflection and transitions concentrate information about reasoning outcomes disproportionately compared to other tokens, and what role they play in reasoning performance.

Synthesis note · 2026-02-23 · sourced from MechInterp

By tracking mutual information (MI) between intermediate representations and the correct answer at each step of LRM reasoning, an interesting phenomenon emerges: MI spikes suddenly at specific steps, creating sparse, non-uniform "MI peaks" throughout the reasoning process.

These peaks overwhelmingly correspond to tokens expressing reflection, self-correction, or transitions — "Wait," "Hmm," "Therefore," "So" — which the authors term "thinking tokens." Three key findings:

Thinking tokens are functionally necessary. Fully suppressing them significantly harms reasoning performance. Randomly suppressing the same number of tokens has minimal impact. The information is concentrated in the thinking tokens, not distributed across the trace.
MI peaks are a training artifact. Base models (e.g., LLaMA-3.1-8B) do not exhibit the MI peaks phenomenon clearly. The distinct pattern emerges from reasoning-intensive training (RL post-training). This suggests reasoning training teaches models to concentrate information at specific reflection points.
Two practical improvements follow. Representation Recycling (allowing MI-peak representations to iterate through the model multiple times) improves accuracy by 20% on AIME24. Thinking Token Test-time Scaling (forcing continued reasoning from thinking tokens when budget remains) yields steady performance improvements.

This provides an information-theoretic complement to the sentence-level thought anchors finding. Which sentences actually steer a reasoning trace? identifies planning and backtracking sentences via counterfactual, attention, and causal suppression methods. MI peaks identify the same pivotal role via information theory — converging from a different analytical direction.

The convergence across methods (counterfactual importance, attention patterns, causal suppression, and now mutual information) and across granularity levels (token-level MI peaks, sentence-level thought anchors, RLVR's high-entropy forking tokens) strongly supports the claim that reasoning traces have a sparse-pivot structure. Most tokens are filler; a small subset carries the reasoning signal.

Inquiring lines that use this note as a source 81

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 7

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 147 in 2-hop network ·dense cluster Open in graph ↗

Do reflection tokens carry more information abou… Which sentences actually steer a reasoning trace? Do high-entropy tokens drive reasoning model impro… Does more thinking time always improve reasoning a… Does RL teach reasoning or just when to use it? Do reasoning cycles in hidden states reveal aha mo… Can we measure how deeply a model actually reasons… Does self-distillation harm mathematical reasoning…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Which sentences actually steer a reasoning trace? Can we identify which sentences in a reasoning trace have outsized influence on the final answer? Three independent methods converge on a surprising answer about planning and backtracking.
sentence-level complement; MI peaks add information-theoretic evidence for the same sparse-pivot structure
Do high-entropy tokens drive reasoning model improvements? Explores whether only a small fraction of tokens—those with high entropy at decision points—actually matter for improving reasoning performance in language models, and whether training on them alone could work as well as full training.
token-level RLVR analog: high-entropy tokens during training correspond to MI-peak tokens during inference
Does more thinking time always improve reasoning accuracy? Explores whether extending a model's thinking tokens linearly improves performance, or if there's a point beyond which additional reasoning becomes counterproductive.
MI peaks explain what matters within the token budget: it's the density of thinking tokens, not total length
Does RL teach reasoning or just when to use it? Does reinforcement learning in thinking models actually create new reasoning abilities, or does it simply teach existing capabilities when to activate? This matters for understanding where reasoning truly emerges.
MI peaks as a mechanistic signature: RL training creates the MI-peak pattern that base models lack
Do reasoning cycles in hidden states reveal aha moments? What if the internal loops in model reasoning—visible in hidden-state topology—correspond to the reconsidering moments that happen during reasoning? This note explores whether graph cyclicity captures a mechanistic signature of insight.
hidden-state topology confirms the same sparse-pivot structure
Can we measure how deeply a model actually reasons? What if reasoning quality isn't about length or confidence, but about how much a model's predictions shift across its internal layers? Can tracking these shifts reveal genuine thinking versus pattern-matching?
complementary token-level measurement: MI peaks identify WHICH tokens matter via information theory; DTR identifies HOW DEEPLY the model computes at each token via layer-wise prediction stabilization; orthogonal methods converging on the same sparse-pivot structure at the representation-graph level: cyclicity corresponds to backtracking tokens (MI peaks at self-correction), diameter tracks exploration breadth; both analyses converge on reasoning having a concentrated structure rather than uniform information distribution
Does self-distillation harm mathematical reasoning performance? Self-distillation usually improves models while shortening outputs, but mathematical reasoning shows a puzzling exception: performance drops up to 40%. What mechanism explains this counter-intuitive degradation?
empirical consequence: when self-distillation suppresses the very Wait/Hmm tokens this note identifies as MI peaks, reasoning performance drops up to 40% on Qwen3 and DeepSeek-Distill. The Why-Does-Self-Distillation paper provides the strongest experimental confirmation that thinking tokens are functionally necessary — not just correlationally informative — across post-training procedures.

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

thinking tokens are mutual information peaks — sparse reflection and transition tokens carry disproportionate information about correct answers

Do reflection tokens carry more information about correct answers?

Related concepts in this collection 7

Related papers in this collection 8

Search by related questions 4