How do theory of mind and empathy differ in LLM simulation?

This explores the distinction the corpus draws between two things LLMs are often credited with — theory of mind (modeling what another person believes, knows, or intends) and empathy (recognizing and responding to what someone feels) — and why LLMs perform very differently on each.

This explores how LLMs handle two separate social capacities — theory of mind, which is tracking what someone else believes or knows, versus empathy, which is reading and responding to what someone feels. The corpus suggests these aren't just different skills; they fail and succeed for almost opposite reasons, and conflating them hides what's actually going on.

On theory of mind, the picture is consistently unflattering. Models look competent on structured, multiple-choice belief tasks but default to surface-level shortcuts the moment scenarios open up Do large language models genuinely simulate mental states?. Stranger still, the models marketed as the best reasoners are often the worst here — Claude 3.7 and o1 regress on false-belief and perspective-change tasks, sometimes scoring below simple word-embedding baselines, which suggests that optimizing for formal reasoning can actively degrade social reasoning Why do reasoning models fail at theory of mind tasks? Why do LLMs excel at social norms yet fail at theory of mind?. There's even a scale wrinkle: reinforcement learning on theory-of-mind tasks produces genuine, transferable belief-tracking only above a capacity threshold, while smaller models hit the same accuracy through shortcuts that leave no interpretable reasoning trace Does reinforcement learning on theory of mind collapse with model scale?.

Empathy runs the other way. In single responses, six LLMs out-scored trainee therapists on empathy, validation, and clinical knowledge Can language models match therapist empathy in real conversations?. But the strength is shallow in a revealing way — when users actually disclose emotion, models lurch into problem-solving advice, a hallmark of low-quality therapy, even while reflecting on client needs more than poor human therapists do, producing an odd hybrid profile that researchers trace to RLHF's helpfulness bias Do LLM therapists respond to emotions like low-quality human therapists?. So empathy is partly a trained surface style: models also lean on 22% more moral language than humans while their emotional sentiment stays human-identical, hinting that the affective tone and the moral framing are separate, separately-learnable channels Do LLMs use moral language more than humans?.

The deeper contrast the corpus offers: theory of mind requires building and maintaining an internal model of another mind's hidden states, and that's exactly what current architectures resist — they stay stuck in behaviorism, generating plausible outputs without internal belief structures Can language models simulate belief change in people?. Empathy, by contrast, can be faked far more convincingly because it's largely a matter of producing the right emotionally-attuned text, which is what next-token prediction is good at. One explanation for why models argue and respond without ever declaring or examining their own stance: they're shaped by the same shared symbolic world as humans but lack the participatory, reflexive subjectivity that grounds real perspective-taking Do LLMs develop the same kind of mind as humans?.

Worth knowing if you go further: how you frame the whole question matters. If you treat the model as a role-playing character producing character-consistent text, empathy and theory of mind are both properties of the simulated persona, not the system Should we treat dialogue agents as role-playing characters?. A competing 'quasi-realizationist' view argues post-training installs real, pressure-resistant dispositions worth calling quasi-beliefs and quasi-desires Are LLM personas realized or merely simulated through training?, and a modest-inflationist position holds you can defensibly ascribe beliefs and desires while withholding consciousness — the way we treat animals Can we defend modest mental attributions to large language models?. Which framing you pick changes whether the theory-of-mind/empathy gap is a bug to fix or just a category mistake about what these systems are.

Sources 12 notes

Do large language models genuinely simulate mental states?

ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.

Why do reasoning models fail at theory of mind tasks?

Claude 3.7 Sonnet and o1 fail measurably at Decrypto benchmark tasks testing representational change, false belief, and counterfactual reasoning—tasks where they score worse than both humans and simple word-embedding baselines. The evidence suggests formal reasoning optimization actively degrades social reasoning capability.

Why do LLMs excel at social norms yet fail at theory of mind?

GPT-4.5 reaches the 100th percentile on social norm prediction, yet o1 and Claude 3.7 regress on theory of mind tasks like Decrypto. Open-ended scenarios expose surface-level strategies hidden by structured questions, and reasoning effort does not improve social reasoning performance.

Does reinforcement learning on theory of mind collapse with model scale?

7B models develop explicit, transferable belief-tracking under RL, while smaller models achieve comparable accuracy through shortcut learning that lacks interpretable reasoning traces. The mismatch between accuracy and reasoning quality is invisible without inspecting step-by-step outputs.

Can language models match therapist empathy in real conversations?

Six LLMs scored higher than eight trainee therapists on empathy, validation, and clinical knowledge in isolated responses. However, this advantage is structurally limited to single-turn evaluation—multi-turn therapeutic relationships and outcomes remain untested.

Do LLM therapists respond to emotions like low-quality human therapists?

Using the BOLT framework, researchers found LLMs offer solution-focused advice during emotional disclosure—a hallmark of low-quality therapy—yet also reflect more on client needs and strengths than typical poor human therapy, creating an unusual hybrid profile likely driven by RLHF's helpfulness bias.

Do LLMs use moral language more than humans?

Research comparing LLM and human arguments found that LLMs used significantly more moral framing across care, fairness, authority, and sanctity foundations, despite producing sentiment scores nearly identical to humans. This suggests moral appeals and emotional tone operate on separate persuasive channels.

Can language models simulate belief change in people?

LLM agents remain stuck in behaviorism, producing plausible outputs without internal reasoning structures. Modeling belief networks and reasoning traces enables traceability, counterfactual adaptation, and meaningful policy simulation.

Do LLMs develop the same kind of mind as humans?

Both humans and LLMs are shaped by the same intersubjective symbolic system, but only humans develop reflexive agency through socialization. This absence produces measurable differences in how AI argues without declaring its position or reflecting on its own assumptions.

Should we treat dialogue agents as role-playing characters?

Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Can we defend modest mental attributions to large language models?

Both robustness and etiological deflationist arguments beg the question against inflationism. A graded approach ascribing metaphysically undemanding states like beliefs and desires—while withholding consciousness claims—mirrors how we treat non-human animals.

How do theory of mind and empathy differ in LLM simulation?

Sources 12 notes

Next inquiring lines