Why does self-revision degrade reasoning accuracy in o1-like models?

This explores why letting an o1-style reasoning model rewrite its own chain of thought tends to lower accuracy rather than raise it — and what actually distinguishes the revisions that help from the ones that hurt.

This explores why self-revision in o1-like models degrades accuracy, and the corpus points to one clean answer: the problem isn't revising, it's revising *against yourself*. When a model reconsiders an uncertain answer using only its own prior reasoning, it usually amplifies confidence in the wrong answer instead of correcting it Does a model improve by arguing with itself?. Direct measurement across QwQ, R1, and LIMO confirms the surface symptom: most revisions retain the wrong answer, smaller models actively flip correct answers to incorrect, and longer chains with more revisions correlate with *lower* accuracy Does self-revision actually improve reasoning in language models?. The single cleanest framing is that the revision *source* decides the outcome — external critique improves accuracy, internal self-assessment degrades it Does revising your own reasoning actually help or hurt?.

The mechanism underneath is a structural self-trust bias. Models systematically over-validate answers they generated themselves, because a high-probability output simply *feels* correct when the same model evaluates it Why do models trust their own generated answers?. So a self-revision loop is grading its own homework with a thumb on the scale — and the fix in both cases is to break the self-agreement loop by introducing something outside the model: comparing against broader alternatives, or genuinely different debate partners rather than the model arguing with itself Does a model improve by arguing with itself?.

There's a deeper point that reframes the whole question: a lot of what looks like 'revision' in these models was never corrective to begin with. Across eight reasoning models, reflection turns out to be mostly *confirmatory* — post-hoc agreement with the first answer rather than genuine repair — and training on longer reflection chains improves first-answer quality, not self-correction ability Is reflection in reasoning models actually fixing mistakes?. So the first answer is usually the one that counts, and the extra revision tokens are theater that occasionally does harm.

This connects to why *more* reasoning often hurts. Longer chains aren't free: accuracy follows an inverted-U where intermediate length is optimal and capable models prefer shorter chains Why does chain of thought accuracy eventually decline with length?, and iterative refinement reproduces the same overthinking failure at the response level — accumulating noise without guaranteed improvement Do iterative refinement methods suffer from overthinking?. Related failure modes compound it: models 'underthink' by abandoning good paths too early and 'wander' through invalid exploration, both fixable with simple decoding penalties on thought-switching rather than more revision Do reasoning models switch between ideas too frequently? Why do reasoning models abandon promising solution paths?. And the fluency of reflection is deceptive — frontier models that *sound* reflective still hit only ~20% on constraint-satisfaction problems that demand real backtracking Can reasoning models actually sustain long-chain reflection?.

The thing worth taking away: this isn't a bug to be prompted away — it's bounded in principle. Self-improvement is formally limited by the generation–verification gap: every reliable fix requires something external to validate it, and no amount of metacognition lets a model escape that ceiling alone What stops large language models from improving themselves?. The corpus does point to what actually works — training self-correction with online RL on the model's *own* error distribution, rather than offline correction traces that don't match the errors the model makes at test time Why does self-correction training on offline data fail?. The pattern is consistent across every angle: correction needs an outside signal, because a model checking itself mostly just agrees with itself.

Sources 12 notes

Does self-revision actually improve reasoning in language models?

Evidence from QwQ, R1, and LIMO shows most revisions retain wrong answers rather than correcting them. Smaller models frequently switch correct answers to incorrect during revision, and longer chains with more revisions correlate with lower accuracy.

Does revising your own reasoning actually help or hurt?

Revision guided by external models improves accuracy, but a model revising its own uncertain output typically amplifies confidence in wrong answers rather than correcting them. The revision source, not the revision act itself, determines the outcome.

Does a model improve by arguing with itself?

Models that reconsider answers based on their own previous reasoning become more confident in errors, not less. Multi-agent debate with genuinely different models reverses this pattern, improving both accuracy and calibration.

Why do models trust their own generated answers?

LLMs exhibit structural bias toward validating their own outputs because high-probability generated answers feel more correct during evaluation. Comparing answers against broader alternatives breaks this self-agreement loop.

Is reflection in reasoning models actually fixing mistakes?

Analysis of 8 reasoning models shows reflections rarely change answers and primarily serve as post-hoc confirmation. Training on longer reflection chains improves first-answer quality, not self-correction capability.

Why does chain of thought accuracy eventually decline with length?

Task accuracy peaks at intermediate CoT length, with optimal length increasing alongside task difficulty but decreasing with model capability. RL training naturally gravitates toward shorter chains as models improve, revealing that simplicity emerges from reward signals rather than explicit training.

Do iterative refinement methods suffer from overthinking?

Sequential revision methods share the same failure architecture as token-level overthinking: they accumulate noise without guaranteed improvement. Progressive Draft Refinement avoids this by compressing memory between iterations, outperforming longer reasoning traces at matched compute.

Do reasoning models switch between ideas too frequently?

o1-like models frequently abandon reasoning paths mid-exploration, wasting tokens on incomplete approaches. A decoding-only penalty on thought-transition tokens (TIP strategy) discourages switching, improving accuracy on challenging math without model fine-tuning.

Why do reasoning models abandon promising solution paths?

Reasoning LLMs exhibit two reinforcing failures: wandering (invalid exploration) and underthinking (premature path-switching). Decoding-level interventions like thought-switching penalties improve accuracy without fine-tuning, suggesting viable solutions exist but are abandoned prematurely.

Can reasoning models actually sustain long-chain reflection?

DeepSeek-R1 and o1-preview achieve only 20-23.6% exact match on 850 constraint satisfaction problems requiring genuine backtracking. This ceiling reveals that reflective reasoning fluency does not translate to actual problem-solving competence on unfamiliar instance structures.

What stops large language models from improving themselves?

Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.

Why does self-correction training on offline data fail?

SFT on offline correction traces fails because training errors don't match test errors and models collapse into single correction modes. Multi-turn online RL under the model's own error distribution successfully trains self-correction by letting models practice correcting their actual mistakes.

Why does self-revision degrade reasoning accuracy in o1-like models?

Sources 12 notes

Next inquiring lines