Does revising your own reasoning actually help or hurt?
Self-revision in reasoning models often degrades accuracy, while external critique improves it. Understanding what makes revision helpful or harmful could reshape how we design systems that need to correct themselves.
The self-revision literature contains an apparent contradiction. Critique-in-the-loop approaches (AutoMathCritique, Agent-R, Meta-Reasoner) show that revision guided by step-level feedback improves actor model performance. The reasoning model evidence shows that self-revision degrades accuracy — more revision tokens correlate with wrong answers, and smaller models primarily switch correct answers to incorrect during revision, not vice versa. Revision helps in one literature and harms in another.
The resolution: revision source is the determining variable, not the revision act itself.
Externally guided revision: A separate model — potentially better calibrated, trained on critique quality, operating with fresh context — evaluates the current response and provides correction signals. The actor revises against these signals. The quality of the revision is bounded by the quality of the external critic, which can be better than the actor's self-evaluation capacity.
Internally driven revision: The same model second-guesses its own output. The self-evaluation is bounded by the same uncertain capabilities that produced the uncertain output in the first place. A model that got an answer wrong does not have a reliable mechanism for knowing it got it wrong — if it did, it would not have produced the wrong answer. Internal revision therefore adds noise without a reliable correction signal.
Since Does a model improve by arguing with itself?, the mechanism for internal harm is confidence amplification rather than correction: the model does not revise toward correct answers, it revises toward more confidently stated incorrect ones. External debate prevents this by providing genuine challenge.
The practical implication for reasoning system design: do not rely on internal revision loops. If revision is needed, provide an external critic. Do critique models improve diversity during training itself? is the training-time version of the same principle — external critique is more valuable than self-critique across both training and inference.
Inquiring lines that use this note as a source 28
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What makes deliberate practice on your own errors more effective than copying others?
- Does self-revision actually improve reasoning in large language models?
- How do self-revisions degrade reasoning accuracy in extended traces?
- What makes external diversity more effective than sequential revision steps?
- Do self-revision tokens measurably degrade reasoning accuracy in scaled models?
- Why does self-revision degrade reasoning accuracy in o1-like models?
- How does self-revision on wrong answers increase model confidence further?
- Why do reasoning models struggle with self-evaluation and revision?
- How does self-revision in reasoning chains amplify confidence in wrong answers?
- When does self-reflection actually help reasoning models improve?
- Why does single-model self-revision amplify confidence in incorrect answers?
- Why does revision often make reasoning accuracy worse in frontier models?
- Why do reasoning models amplify confidence in incorrect answers during self-revision?
- Can debate between multiple models prevent the failures of single-model self-revision?
- How should training incorporate external critique versus encouraging self-correction?
- Why does external critique improve revision accuracy more than self-assessment?
- Why does model self-revision increase confidence while degrading accuracy?
- Why does external critique improve revision while internal self-assessment fails?
- Does internal self-revision actually degrade reasoning accuracy in models?
- How does confirmatory reflection differ from corrective self-evaluation in models?
- How should systems maintain and revise models of their own assumptions?
- What external anchors prevent self-editing from collapsing into circularity?
- Can external retrieval signals outperform internal self-assessment during revision?
- What distinguishes iterative query refinement from pure self-revision loops?
- How does metacognitive self-correction enable models to revise failed strategies?
- Does deliberate self-revision introduce different errors than passive context contamination?
- Does external critique guide revision better than internal self-assessment during model training?
- Why does self-critique fail without external verification signals?
Related concepts in this collection 6
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does self-revision actually improve reasoning in language models?
When o1-like models revise their own reasoning through tokens like 'Wait' or 'Alternatively', does this reflection catch and fix errors, or does it introduce new mistakes? This matters because self-revision is marketed as a key capability.
the internal-revision pole; this note explains the mechanism
-
Does a model improve by arguing with itself?
When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?
extends: single-model self-revision amplifies confidence; the same-source problem at the multi-turn level
-
Do critique models improve diversity during training itself?
Explores whether critique integrated into the training loop, beyond test-time scoring, actively maintains solution diversity and prevents the model from converging too narrowly during iterative self-training.
training-time analog: external critique is the fix in RL training; same principle at different timescale
-
Does reflection in reasoning models actually correct errors?
When reasoning models reflect on their answers, do they genuinely fix mistakes, or merely confirm what they already decided? Understanding this matters for designing better training and inference strategies.
explains why internal revision fails: most reflection tokens are confirmatory, not evaluative — the model is not actually generating revision signals, making external critique the only path to genuine correction
-
Why does self-correction training on offline data fail?
Can language models learn to correct their own mistakes through supervised training on correction examples? This explores whether distribution mismatch and behavior collapse prevent self-correction from emerging.
SCoRe shows internal revision can work if properly trained: multi-turn online RL under the model's own error distribution converts internal revision from a harmful default into a trained capability, challenging the conclusion that external critique is the only path
-
Can a model's partial response guide what to retrieve next?
Does using the model's in-progress output as a retrieval signal reveal information needs better than the original query alone? This explores whether generation itself can diagnose what documents are missing.
ITER-RETGEN is a retrieval-layer implementation of externally-guided revision: instead of the same model critiquing its own output, the response is used to retrieve new external documents that guide regeneration; the external information source plays the role of external critic
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
- The Prompt Report: A Systematic Survey of Prompting Techniques
- A Survey on Post-training of Large Language Models
- Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
- Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
- Post-Completion Learning for Language Models
- Can Large Language Models Reason and Plan?
- Self-Improving Model Steering
Original note title
revision source determines accuracy outcome — external critique-guided revision improves performance while internal self-assessment-driven revision degrades it