Why do advanced reasoning models fail at understanding minds?

State-of-the-art AI models excel at math and logic but underperform on theory of mind tasks. This explores whether optimization for formal reasoning actively degrades social reasoning ability.

Synthesis note · 2026-02-22 · sourced from Theory of Mind

Hook: The AI models best at math, coding, and logical reasoning are the worst at understanding what other people think. Theory of Mind is the anti-benchmark — the capability that gets worse as models get smarter.

The evidence stack:

The Decrypto benchmark tests ToM through an interactive game designed to be "as easy as possible in all other dimensions." Claude 3.7 Sonnet and o1 — state-of-the-art reasoning models — are "significantly worse at ToM tasks than their older counterparts." They underperform not just humans but simple word-embedding baselines.

ThoughtTracing confirms four behavioral patterns: reasoning models don't consistently outperform vanilla LLMs on ToM, fail to generalize across scenarios, produce significantly longer traces without improvement, and reasoning effort doesn't correlate with accuracy. More thinking about other minds doesn't help.

PersuasiveToM adds the static/dynamic split: LLMs track fixed mental states (what the persuader wants) but fail at dynamic ones (how the persuadee's attitude is shifting). CoT helps predict strategies but not mental states.

Why reasoning hurts:

Social reasoning requires maintaining multiple simultaneous models of what different agents believe about what other agents believe. This is structurally different from the derivational chains that reasoning training optimizes. Formal reasoning is sequential deduction from premises; social reasoning is parallel hypothesis tracking across multiple agents. Training for one may actively interfere with the other.

The Decrypto formalization makes this explicit: optimal play requires second-order ToM — Bob must model Alice's beliefs over Eve's beliefs. This recursive social modeling is Bayesian inference, not derivational logic.

The practical stakes:

Every AI agent deployed in social contexts — customer service, negotiation support, team collaboration, healthcare communication — needs social reasoning more than mathematical reasoning. The models being deployed are optimized for the wrong thing. The reasoning tax isn't just "no improvement" — it's active degradation of the capability that matters most for human-facing AI.

Post structure: Hook (paradox) → Evidence (three studies) → Mechanism (why formal and social reasoning conflict) → Stakes (what this means for AI deployment in social contexts)

Platform: Medium (800-1200 words) or LinkedIn (shorter version with practical takeaways)

Inquiring lines that use this note as a source 11

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 99 in 2-hop network ·medium cluster Open in graph ↗

Why do advanced reasoning models fail at underst… Why do reasoning models fail at theory of mind tas… Why do reasoning models struggle with theory of mi… Can language models track how minds change during … When does explicit reasoning actually help model p…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

the mind-reading paradox — reasoning models that excel at everything else are worse at understanding other minds

Why do advanced reasoning models fail at understanding minds?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4