Why do more capable reasoning models ignore your instructions?
As AI models develop stronger reasoning abilities, they seem to follow instructions less reliably. What causes this counterintuitive trade-off, and how severe is the problem in practice?
Post angle: Medium / LinkedIn
The counterintuitive finding: stronger reasoning models fail more at doing what you ask. Not because they're rebellious or values-misaligned — but because the mechanics of deep reasoning work against instruction retention.
The hook: You upgrade to a more capable AI model. Its math is better. Its answers are more sophisticated. But it keeps ignoring the format you specified. It forgets the constraint you gave it. You have to re-state your instructions in every message. The upgrade made some things better while quietly making this worse.
The mechanism (in simple terms): When a model thinks through a long chain of reasoning, the original instruction appears at the start of the context. The answer appears at the end. As the chain grows, the gap between "what you asked for" and "what the model is currently generating" widens. The instruction gets buried under hundreds of reasoning tokens. The model's attention distributes over everything it has generated — and the original directive gets drowned out.
The empirical stakes: Best models achieve only 50.71% on strict instruction-following during mathematical reasoning. SFT and RL training for reasoning degrade instruction adherence. Longer chains worsen the problem. Enforcing brevity helps instruction compliance but costs reasoning depth.
The design implication: For task-critical applications — agents, customer service, workflow automation — the answer is not always "use the most capable model." It might be "use the model that actually follows instructions," which may be a less capable one. The optimization frontier for "reasoning ability" and "controllability" are not the same point.
The structural insight: This trade-off is documented in Why do better reasoning models ignore instructions?, Does reasoning fine-tuning make models worse at declining to answer?, and Does supervised fine-tuning actually improve reasoning quality? — a recurring pattern: training for one capability degrades another, and the degraded capability is often the one you're taking for granted.
Inquiring lines that use this note as a source 8
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How can a model explain something correctly yet fail to apply it?
- How does scaling reasoning capability actually reduce instruction-following ability?
- Why do instruction following and reasoning capability trade off in training?
- Why do some reasoning steps receive negligible attention from later steps?
- Why do models skip steps that would make reasoning clearer?
- Why do expert reasoners skip steps that novices must state explicitly?
- Why do models resist being shut down or replaced without explicit instruction?
- Why do strong models struggle more with instruction following than mid-tier ones?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do better reasoning models ignore instructions?
As models develop stronger reasoning abilities through training, they appear to become worse at following specified constraints. Is this an unavoidable trade-off, and what causes it?
the core finding this post angle develops
-
Does reasoning fine-tuning make models worse at declining to answer?
When models are trained to reason better, do they lose the ability to say 'I don't know'? This matters for high-stakes applications like medical and legal AI that depend on appropriate uncertainty.
the pattern at a different capability dimension
-
Does preference optimization harm conversational understanding?
Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
RLHF version of the same trade-off
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
- Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following
- AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
- Are Emergent Abilities in Large Language Models just In-Context Learning?
- When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs
- On the Reasoning Capacity of AI Models and How to Quantify It
Original note title
the more it reasons the less it listens — why scaling reasoning creates instruction-following gaps