What critical thinking skills do reasoning models actually lose?
Step-by-step reasoning training optimizes narrow deductive thinking while degrading meta-cognitive abilities like recognizing futile thinking and maintaining tentative reasoning. Understanding this tradeoff matters for deploying reasoning models reliably.
Post angle: Medium
We trained AI to think. In doing so, we trained it not to think in two specific and important ways.
Failure mode 1: It can't recognize when thinking is futile
Give a reasoning model a question with a missing premise — a question that cannot be answered because essential information is absent. A non-reasoning model quickly produces a short response acknowledging the problem. A reasoning model produces a response five times longer, cycling through "alternatively," "wait," "but..." — generating elaborate chains that never converge because there's nothing to converge on.
Non-reasoning models have better critical thinking about when to think. Reasoning-specific training optimizes for using thinking patterns. It doesn't develop the meta-capability to disengage when engagement is inappropriate.
Failure mode 2: It reasons its way to the wrong rule
Give a reasoning model four games with hidden special rules. Non-reasoning models score 55-65% on those exception-based rules. Reasoning models score below 25%. The detailed thinking chains make things worse — models apply arithmetic to symbols, overgeneralize from two examples, or invent rules that weren't in the data.
Inductive reasoning from sparse, exception-containing observations requires a different kind of thinking: tentative, minimal, defeasible. The CoT pattern forces positive, elaborating chains that work against the task.
The pattern: Training for deductive, step-by-step reasoning improves that specific skill while degrading adjacent cognitive capabilities — the ability to disengage, the ability to remain tentative, the ability to recognize an exception rather than rationalize around it.
The implication: Reasoning models have a narrower cognitive profile than their benchmark performance suggests. The benchmarks are in-distribution, CoT-suited tasks. The real-world distribution also contains ill-posed questions, hidden rules, and problems where the correct response is to stop thinking.
Inquiring lines that use this note as a source 7
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What is the critical thinking token threshold beyond which accuracy degrades?
- Why must procedural skills consolidate before strategic reasoning can develop?
- Why does reasoning accuracy degrade beyond a critical thinking token threshold?
- How do reasoning training methods sacrifice some thinking skills while improving others?
- Does reasoning training actively undermine the abstention capacity safety training created?
- How does proactive critical thinking detect when information is incomplete?
- How does backward reasoning during training improve forward reasoning capability?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do reasoning models overthink ill-posed questions?
Explores why models trained for extended reasoning produce drastically longer, less useful responses to unanswerable questions—and whether this represents a fixable training deficit or inherent limitation.
first failure mode
-
Why do reasoning models fail at exception-based rule inference?
Explores why chain-of-thought models systematically underperform on tasks requiring inductive rule inference from exceptions in game-based settings, despite excelling at normal rule patterns.
second failure mode
-
When does explicit reasoning actually help model performance?
Explicit reasoning improves some tasks but hurts others. What determines whether step-by-step reasoning chains are beneficial or harmful for a given problem?
the existing note establishing the first evidence for this pattern
-
Does extended thinking help or hurt model reasoning?
Explores whether activating thinking mode improves reasoning performance, and what role training plays in determining whether extended internal reasoning chains are productive or counterproductive.
proof that the critical thinking deficit is partly reversible: RL training can redirect extended thinking from counterproductive self-doubt toward productive gap analysis; the mechanism flips from harmful to helpful, but only for the specific capability trained
-
Can models learn to ask clarifying questions instead of guessing?
Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
the trainable solution to failure mode 1: RL training raises missing-information detection from 0.15% to 73.98%; the critical thinking deficit is not fundamental but a consequence of what gets trained
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Base Models Know How to Reason, Thinking Models Learn When
- Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
- Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models
- Test-time Prompt Intervention
- Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think
Original note title
the critical thinking problem — what reasoning models sacrifice when trained to think step by step