Does chain-of-thought reasoning actually explain AI decisions?
Chain-of-thought is pitched as a transparency tool for agentic AI, but empirical evidence raises questions about whether reasoning chains actually predict or explain the system's outputs in practice.
Post angle — Medium
The pitch for CoT in production systems: by generating reasoning steps before answers, you get transparency into the model's decision-making process. You can audit the reasoning, catch errors, build user trust.
The empirical finding from "Thoughts without Thinking": in agentic multi-LLM pipelines, reviewer scores for CoT thoughts are weakly correlated with reviewer scores for responses. The reasoning chain doesn't predict whether the output will be correct. Incorrect outputs can follow plausible-looking chains; incorrect chains don't reliably produce incorrect outputs.
This is not just academic. The CoT explainability promise is used to justify deploying agentic AI in high-stakes settings — because "you can see the reasoning." If the reasoning doesn't causally produce the output, this justification is hollow.
The deeper problem: CoT generates more material for post-hoc analysis, not better explainability. There's a difference between "I can analyze what went wrong" (what CoT provides) and "I can understand what the system will do" (what explainability requires). The former requires significant analytical effort and may actively mislead by appearing coherent.
The Einstellung Paradigm finding makes this concrete: the chain quickly gravitates toward statistically common token sequences, even when they contradict the task. The chain doesn't reveal this deviation — it looks fluent throughout.
Connections: Does chain of thought reasoning actually explain model decisions?, Do reasoning traces actually cause correct answers?, Do language models actually use their reasoning steps?
Inquiring lines that use this note as a source 2
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
- Chain-of-Thought Is Not Explainability
- A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap
- Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
- Reasoning Models Don't Always Say What They Think
- Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
- Chain of Thoughtlessness? An Analysis of CoT in Planning
- Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning
Original note title
the explainability illusion — why cot in agentic pipelines produces chains that don't explain anything