Does chain-of-thought reasoning actually explain AI decisions?

Chain-of-thought is pitched as a transparency tool for agentic AI, but empirical evidence raises questions about whether reasoning chains actually predict or explain the system's outputs in practice.

Synthesis note · 2026-02-22 · sourced from Reasoning Architectures

Post angle — Medium

The pitch for CoT in production systems: by generating reasoning steps before answers, you get transparency into the model's decision-making process. You can audit the reasoning, catch errors, build user trust.

The empirical finding from "Thoughts without Thinking": in agentic multi-LLM pipelines, reviewer scores for CoT thoughts are weakly correlated with reviewer scores for responses. The reasoning chain doesn't predict whether the output will be correct. Incorrect outputs can follow plausible-looking chains; incorrect chains don't reliably produce incorrect outputs.

This is not just academic. The CoT explainability promise is used to justify deploying agentic AI in high-stakes settings — because "you can see the reasoning." If the reasoning doesn't causally produce the output, this justification is hollow.

The deeper problem: CoT generates more material for post-hoc analysis, not better explainability. There's a difference between "I can analyze what went wrong" (what CoT provides) and "I can understand what the system will do" (what explainability requires). The former requires significant analytical effort and may actively mislead by appearing coherent.

The Einstellung Paradigm finding makes this concrete: the chain quickly gravitates toward statistically common token sequences, even when they contradict the task. The chain doesn't reveal this deviation — it looks fluent throughout.

Connections: Does chain of thought reasoning actually explain model decisions?, Do reasoning traces actually cause correct answers?, Do language models actually use their reasoning steps?

Inquiring lines that use this note as a source 2

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

the explainability illusion — why cot in agentic pipelines produces chains that don't explain anything

Does chain-of-thought reasoning actually explain AI decisions?

Related papers in this collection 8

Search by related questions 5