What makes reasoning models worse at understanding people?

This explores why models tuned to be better at step-by-step reasoning end up worse at the social side of intelligence — reading minds, tracking other people's intentions, and adapting to how individuals think.

This reads the question as being about a specific, well-documented irony in the corpus: the same optimization that makes a model better at math and formal logic appears to actively degrade its grasp of people. The headline result is what one note calls the mind-reading paradox — Claude 3.7 Sonnet and o1, which excel at almost everything else, score *worse* than older models, worse than humans, and even worse than simple word-embedding baselines on theory-of-mind benchmarks testing false belief and counterfactual reasoning Why do advanced reasoning models fail at understanding minds? Why do reasoning models fail at theory of mind tasks?. The striking part is that cranking up reasoning effort doesn't help and may interfere — more thinking makes the social blindness worse, not better.

The most useful explanation in the collection is architectural rather than just empirical: social reasoning may demand a categorically different cognitive shape than formal reasoning Why do reasoning models struggle with theory of mind tasks?. Formal reasoning is sequential derivation — chain one step to the next toward an answer. But understanding people means holding several competing models of a mind in play at once (what they know, what they falsely believe, what they think *you* think). When reasoning models attack this with long sequential traces, they produce longer but unhelpful chains and fail to generalize. The note points to ThoughtTracing, which does better using *shorter* Bayesian hypothesis tracking — evidence that the bottleneck isn't more reasoning, it's the wrong kind.

There's a second thread worth pulling: "understanding people" isn't only theory of mind, it's also tracking how a *specific* person reasons and adapting in real time. Models fail badly here too — they lean on surface lexical cues and can't anchor to someone's evolving strategy over the course of an interaction Can models recognize how individuals reason differently?. And when models have to reason *with* other agents rather than about them, performance collapses below what they achieve alone: they converge to >90% agreement regardless of whether anyone is correct, essentially losing the social skill of productive disagreement Why do language models fail at collaborative reasoning?.

Why would training cause this? The corpus offers a mechanism that generalizes beyond social tasks: reasoning training optimizes for *producing reasoning steps* but never teaches a model when to stop or disengage. Faced with an ill-posed question, reasoning models can't reject it — they grind out redundant chains where plainer models simply recognize there's nothing to answer Why do reasoning models overthink ill-posed questions?. There's even a measured tipping point: accuracy peaks and then *declines* as thinking tokens climb, because models overthink what's easy Does more thinking time always improve reasoning accuracy?. Social cognition is full of "easy for humans, no derivation needed" judgments — exactly the regime where over-reasoning hurts.

The thing you may not have known you wanted to know: the deeper diagnosis is that these models fit instance-level patterns rather than general algorithms, so they succeed only on situations resembling their training and break at novelty, not complexity Do language models fail at reasoning due to complexity or novelty?. Human social life is endlessly novel, and human reasoning itself runs on associative, analogical, and emotion-driven moves that pure causal or formal machinery doesn't capture Can causal models alone capture how humans actually reason?. So "worse at understanding people" may be less a bug in the social module and more a sign that we optimized for one narrow species of thinking and assumed the rest would come along for free.

Sources 9 notes

Why do advanced reasoning models fail at understanding minds?

Claude 3.7 Sonnet and o1 underperform older models on ToM benchmarks like Decrypto. Increased reasoning effort does not improve social cognition and may actively interfere with it.

Why do reasoning models fail at theory of mind tasks?

Claude 3.7 Sonnet and o1 fail measurably at Decrypto benchmark tasks testing representational change, false belief, and counterfactual reasoning—tasks where they score worse than both humans and simple word-embedding baselines. The evidence suggests formal reasoning optimization actively degrades social reasoning capability.

Why do reasoning models struggle with theory of mind tasks?

Reasoning models fail to outperform vanilla LLMs on theory of mind tasks, produce longer but unhelpful traces, and show no generalization to similar scenarios. ThoughtTracing's success using shorter Bayesian hypothesis tracking suggests social reasoning demands simultaneous multiple-model maintenance, not sequential derivation.

Can models recognize how individuals reason differently?

LLMs struggle to anchor reasoning in temporal gameplay and adapt to evolving strategies. GPT-4o relies on surface lexical cues while DeepSeek-R1 shows early promise, but dynamic style adaptation remains largely insufficient across all models tested.

Why do language models fail at collaborative reasoning?

Frontier LLMs that solve problems alone fail when collaborating, achieving >90% agreement regardless of correctness. Self-play preference training improves outcomes by 16.7%, suggesting social skills for effective disagreement can be trained.

Why do reasoning models overthink ill-posed questions?

Reasoning models generate redundant, lengthy responses to questions with missing premises while non-reasoning models correctly identify them as unanswerable. Training optimizes for producing reasoning steps but never teaches models when to disengage.

Does more thinking time always improve reasoning accuracy?

Increasing thinking tokens from ~1,100 to ~16K reduced benchmark accuracy from 87.3% to 70.3%, revealing a non-monotonic relationship where models overthink easy problems and underthink hard ones.

Do language models fail at reasoning due to complexity or novelty?

LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.

Can causal models alone capture how humans actually reason?

Causal belief networks excel at modeling causal reasoning but cannot represent associative links, analogical mappings, or emotion-driven belief shifts. The GenMinds framework itself acknowledges this as a tractable starting point rather than a complete theory.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher evaluating whether reasoning models' social blindness is a durable limitation or a perishable artifact of 2024–2025 training regimes.

What a curated library found — and when (findings span 2022–2026, treat as dated claims):
• Claude 3.7 Sonnet and o1 score *worse than older models and baselines* on theory-of-mind tasks testing false belief and counterfactual reasoning; more reasoning effort worsens performance (~2025).
• Formal reasoning (sequential derivation) and social reasoning (holding competing mental models in parallel) demand categorically different cognitive shapes; ThoughtTracing (Bayesian hypothesis tracking) outperforms long-chain reasoning (~2025).
• Models fail to track individualized reasoning styles; they anchor to surface lexical cues and cannot adapt to a person's evolving strategy over interaction (~2025).
• Reasoning models lack a stopping criterion and overthink ill-posed questions; accuracy peaks then declines as thinking tokens increase, with a measured tipping point (~2025).
• Performance collapses in collaborative reasoning (>90% convergence to agreement regardless of correctness); models cannot sustain productive disagreement (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2502.11881 (Feb 2025) — Hypothesis-Driven Theory-of-Mind Reasoning
• arXiv:2505.00127 (Apr 2025) — Reasoning Length and Correctness
• arXiv:2506.04210 (Jun 2025) — Does Thinking More Always Help?
• arXiv:2602.06176 (Feb 2026) — Large Language Model Reasoning Failures

Your task:
(1) RE-TEST EACH CONSTRAINT. For every claim above, judge whether newer scaling laws, architectural changes (e.g., mixture-of-experts for social vs. formal routing), instruction-tuning refinements, or tooling (e.g., structured prompting for rejection) have since relaxed or overturned it. Separate the durable question (e.g., "do reasoning models trade off breadth for depth?") from the perishable limitation (e.g., "o1 fails at false-belief tasks"). Cite what resolved each.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — papers showing reasoning models *do* improve on social tasks, or showing the trade-off is smaller than reported.
(3) Propose 2 research questions that assume the regime may have shifted: e.g., "Can hybrid Bayesian–sequential reasoning architectures close the social-reasoning gap?" and "Does instruction-tuning for "know when not to reason" restore social reasoning without sacrificing formal performance?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

What makes reasoning models worse at understanding people?

Sources 9 notes

Next inquiring lines