SYNTHESIS NOTE
Psychology, Society, and Alignment

Does theory of mind predict who thrives in AI collaboration?

Explores whether perspective-taking ability—the capacity to model another's cognitive state—differentiates humans who benefit most from working with AI, separate from solo problem-solving skill.

Synthesis note · 2026-02-23 · sourced from Human Centered Design
Why do AI agents fail to take initiative? Why do LLMs excel at social norms yet fail at theory of mind?

Collaborative ability with AI is a separable construct from individual problem-solving ability. A Bayesian Item Response Theory framework applied to human-AI benchmark data (n=667 across math, physics, and moral reasoning) estimates both parameters independently while controlling for task difficulty. The key finding: the two abilities are distinct, and what predicts one does not predict the other.

Theory of Mind is the differentiating mechanism. Users with stronger perspective-taking — the ability to infer and adapt to others' cognitive states — achieve superior collaborative performance with AI. But the same users show no advantage when working alone. This is not a general intelligence effect. It is specifically the capacity to model what the AI knows, what it can do, and how to delegate to it that produces the collaboration gain.

The ToM link operates at two timescales. Stable individual differences in perspective-taking predict overall collaborative ability. But moment-to-moment fluctuations in ToM also influence AI response quality within sessions — users who adaptively model the AI's state mid-conversation get better outputs from it.

This creates an irony when combined with the reasoning model findings: since Why do reasoning models fail at theory of mind tasks?, the models best at solving problems independently may be worst at supporting collaborative work. If collaboration quality depends on bidirectional ToM — the user modeling the AI and the AI modeling the user — then optimizing models for raw capability may degrade the very property that makes collaboration productive.

The practical implication is that collaborative ability (κ) is a distinct benchmark axis. Comparing κ across models (κ_GPT4o vs κ_Llama) quantifies how much each model amplifies human performance, independent of the model's standalone capability. This reframes AI evaluation from "how smart is the model?" to "how much smarter does the human-AI team become?"

Since What breaks when humans and AI models misunderstand each other?, the synergy evidence provides empirical grounding: MToM is not just a design fiction requirement but a measurable cognitive mechanism with quantifiable effects on collaboration quality.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
18 direct connections · 127 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

human-AI collaborative ability is distinct from individual ability — theory of mind predicts who benefits from AI partnership