SYNTHESIS NOTE
Model Architecture and Internals Reasoning, Retrieval, and Evaluation Training, RL, and Test-Time Scaling

Do reflection tokens carry more information about correct answers?

Explores whether tokens expressing reflection and transitions concentrate information about reasoning outcomes disproportionately compared to other tokens, and what role they play in reasoning performance.

Synthesis note · 2026-02-23 · sourced from MechInterp
How should we allocate compute budget at inference time? What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

By tracking mutual information (MI) between intermediate representations and the correct answer at each step of LRM reasoning, an interesting phenomenon emerges: MI spikes suddenly at specific steps, creating sparse, non-uniform "MI peaks" throughout the reasoning process.

These peaks overwhelmingly correspond to tokens expressing reflection, self-correction, or transitions — "Wait," "Hmm," "Therefore," "So" — which the authors term "thinking tokens." Three key findings:

  1. Thinking tokens are functionally necessary. Fully suppressing them significantly harms reasoning performance. Randomly suppressing the same number of tokens has minimal impact. The information is concentrated in the thinking tokens, not distributed across the trace.

  2. MI peaks are a training artifact. Base models (e.g., LLaMA-3.1-8B) do not exhibit the MI peaks phenomenon clearly. The distinct pattern emerges from reasoning-intensive training (RL post-training). This suggests reasoning training teaches models to concentrate information at specific reflection points.

  3. Two practical improvements follow. Representation Recycling (allowing MI-peak representations to iterate through the model multiple times) improves accuracy by 20% on AIME24. Thinking Token Test-time Scaling (forcing continued reasoning from thinking tokens when budget remains) yields steady performance improvements.

This provides an information-theoretic complement to the sentence-level thought anchors finding. Which sentences actually steer a reasoning trace? identifies planning and backtracking sentences via counterfactual, attention, and causal suppression methods. MI peaks identify the same pivotal role via information theory — converging from a different analytical direction.

The convergence across methods (counterfactual importance, attention patterns, causal suppression, and now mutual information) and across granularity levels (token-level MI peaks, sentence-level thought anchors, RLVR's high-entropy forking tokens) strongly supports the claim that reasoning traces have a sparse-pivot structure. Most tokens are filler; a small subset carries the reasoning signal.

Inquiring lines that use this note as a source 81

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 7

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 147 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

thinking tokens are mutual information peaks — sparse reflection and transition tokens carry disproportionate information about correct answers