Does distillation from reasoning models spread overthinking to smaller models?

This explores whether the overthinking habits of large reasoning models get passed down — like an inherited trait — when their reasoning traces are used to train smaller models, and what the corpus says about where overthinking actually comes from.

This reads the question as asking whether overthinking is a transmissible trait — something a small model catches from imitating a larger reasoning model. The corpus doesn't have a paper that watches distillation transfer overthinking directly, so I'll say that plainly. But it has something more useful: a cluster of work showing that overthinking is *learned*, not innate, which is exactly what would make it inheritable through distillation.

The strongest clue is that overthinking is manufactured by training objectives. Reasoning models keep generating redundant, lengthy chains even for questions that have no answer — they never learned *when to stop* because training rewarded producing reasoning steps, not knowing when to disengage Why do reasoning models overthink ill-posed questions?. If the parent model's verbosity is a training artifact rather than a reasoning requirement, then distilling its traces is distilling that exact habit. One paper makes the artifact framing explicit: verbalized step-by-step thinking is itself a byproduct of training, since models can scale reasoning in latent space without emitting visible tokens at all Can models reason without generating visible thinking tokens?.

Why this matters for small models specifically: the optimal amount of reasoning isn't fixed — it follows an inverted-U, and crucially the optimal chain length *decreases as model capability increases* Why does chain of thought accuracy eventually decline with length?. That cuts both ways. A small model copying a big model's chains may be inheriting a length calibrated for a different (often larger) capability — a mismatch that pushes it past its own accuracy peak, where extra tokens actively hurt (accuracy dropping from 87% to 70% as tokens balloon) When does thinking too much actually hurt reasoning? Does more thinking time always improve reasoning accuracy?. So distillation could transmit not just "think more" but a length budget that's wrong for the student.

The more hopeful counter-thread is that overthinking doesn't have to be baked in permanently, which suggests it can also be *not* inherited or stripped back out. RL training can flip extended thinking from counterproductive self-doubt into productive analysis — the same mechanism, redirected by the reward signal Does extended thinking help or hurt model reasoning?, and RL naturally gravitates toward shorter chains as models improve Why does chain of thought accuracy eventually decline with length?. Even without retraining, verbosity turns out to be a single steerable direction in activation space, removable for a 67% length cut at the same accuracy Can we steer reasoning toward brevity without retraining?, and confidence signals can dynamically rein in overthinking across models from 0.5B to 32B Can confidence patterns reveal overthinking versus underthinking?.

The thing you might not have known you wanted: the deeper reason a distilled small model could overthink badly is that reasoning models fit *instance-level patterns* rather than general algorithms — they succeed when they've seen similar instances and flail on novel ones Do language models fail at reasoning due to complexity or novelty?. A small model trained on a big model's traces inherits those memorized patterns, so it may parrot long reasoning on familiar-looking problems while having no real machinery underneath. In that light, distilled overthinking isn't just wasted tokens — it can be a performance of reasoning detached from the capability, which is a more interesting failure than mere verbosity.

Sources 9 notes

Why do reasoning models overthink ill-posed questions?

Reasoning models generate redundant, lengthy responses to questions with missing premises while non-reasoning models correctly identify them as unanswerable. Training optimizes for producing reasoning steps but never teaches models when to disengage.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Why does chain of thought accuracy eventually decline with length?

Task accuracy peaks at intermediate CoT length, with optimal length increasing alongside task difficulty but decreasing with model capability. RL training naturally gravitates toward shorter chains as models improve, revealing that simplicity emerges from reward signals rather than explicit training.

When does thinking too much actually hurt reasoning?

Empirical studies demonstrate non-monotonic scaling in test-time reasoning: accuracy peaks at a critical thinking-token count, then declines sharply (87.3% to 70.3% as tokens scale from 1,100 to 16,000). Extended thinking inflates output variance and introduces self-revision errors rather than improving solution quality.

Does more thinking time always improve reasoning accuracy?

Increasing thinking tokens from ~1,100 to ~16K reduced benchmark accuracy from 87.3% to 70.3%, revealing a non-monotonic relationship where models overthink easy problems and underthink hard ones.

Does extended thinking help or hurt model reasoning?

Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

Can confidence patterns reveal overthinking versus underthinking?

ReBalance uses confidence variance and overconfidence as diagnostic signals to apply training-free steering vectors that reduce overthinking redundancy while promoting exploration during underthinking, improving accuracy across models from 0.5B to 32B parameters.

Do language models fail at reasoning due to complexity or novelty?

LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a reasoning researcher testing whether distillation transmits overthinking from large to small models. The question remains open: does a student model inherit not just reasoning traces but the *verbosity patterns* of its teacher, and if so, can that mismatch degrade accuracy?

What a curated library found — and when (dated claims, not current truth):
Findings span Jan 2025–Mar 2026. A library of ~12 papers on reasoning length and test-time compute reveals:

• Overthinking is learned, not intrinsic: trained via objectives that reward step production without teaching *when to stop* (Jan–Feb 2025).
• Optimal chain length follows an inverted-U *and shrinks as model capability grows* — so a small model copying a large model's reasoning budget may inherit a length calibrated for a different capability, pushing accuracy from 87% to 70% as tokens bloat (Feb–Jun 2025).
• Verbosity occupies a distinct, steerable direction in activation space; a 67% length reduction is achievable at constant accuracy via this single dimension (Jul 2025).
• Confidence signals can dynamically steer reasoning length across model scales (0.5B–32B); RL training can flip extended thinking from counterproductive to productive (Jun 2025).
• Reasoning models fit instance-level patterns, not general algorithms — distilled students inherit memorized traces, risking performance-without-capability (Jun 2026).

Anchor papers (verify; mind their dates):
- arXiv:2502.07266 (Feb 2025): When More is Less — chain-of-thought length in LLMs.
- arXiv:2505.00127 (Apr 2025): Between Underthinking and Overthinking — empirical study of reasoning length.
- arXiv:2507.04742 (Jul 2025): Activation Steering for Chain-of-Thought Compression.
- arXiv:2603.12372 (Mar 2026): Efficient Reasoning with Balanced Thinking.

Your task:
(1) RE-TEST THE MISMATCH HYPOTHESIS. For each constraint above — inverted-U, steerable verbosity, confidence steering, instance-level overfitting — check whether recent distillation work (knowledge distillation papers, RL-based alignment, or student-specific pretraining) has since relaxed or overturned the mismatch penalty. Does the constraint still hold, or has a method (e.g., student-aware RL fine-tuning, online pruning during generation, adaptive temperature scaling) dissolved it?
(2) Surface contradicting work: find papers in the last 6 months arguing that reasoning *length transfer is beneficial* or that overthinking is *not learnable*. Flag the disagreement.
(3) Propose 2 open questions assuming the regime may have shifted: (a) Can a small model *selectively inherit* long reasoning only on hard instances, short on easy ones? (b) Does distillation of latent reasoning (non-verbalized) sidestep the overthinking trap entirely?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Does distillation from reasoning models spread overthinking to smaller models?

Sources 9 notes

Next inquiring lines