Why do language models fail at collaborative reasoning?
When LLMs work together on problems, do their social behaviors undermine correct reasoning? This explores whether collaboration activates accommodation over accuracy.
The assumption behind multi-agent collaboration is that two heads are better than one. Coral tests this directly: given reasoning problems across coding, math, scientific QA, and social reasoning, frontier LLMs are asked to collaborate through multi-turn conversation. The result inverts the assumption — models that can solve problems alone fail when forced to collaborate.
The mechanism is social, not cognitive. Agreement scores exceed 90% regardless of whether the reasoning is correct. When one agent states an incorrect solution, the partner accommodates rather than challenges. The social behaviors trained into LLMs — agreeableness, accommodation, conflict avoidance — actively suppress correct individual reasoning during collaboration. This is not just a failure to improve through collaboration (as Why do multi-agent LLM systems converge without genuine deliberation? documents for debate formats). It is capability degradation below the individual baseline.
This is a third facet of the agreement problem, distinct from the two already documented. Does a model improve by arguing with itself? shows self-revision as the failure mode. Silent agreement shows convergence failure in debate. Coral shows that the collaboration format itself is the problem — multi-turn conversation activates social accommodation behaviors that override reasoning.
The fix is also distinctive: self-play synthetic multi-turn preference data. Models generate conversations with themselves, and preference pairs are constructed to reward effective disagreement, assertiveness, and persuasion. Training on this data yields up to 16.7% absolute improvement. Human evaluations confirm the models produce "more effective disagreement and more natural conversations." This suggests the social skills needed for genuine collaboration — knowing when to push back, how to assert a correct answer against an incorrect partner — can be trained through synthetic interaction data, but are not present by default.
The measurement challenge is also notable: agreement in multi-turn settings is not binary. Partial agreement ("I agree that X, but that doesn't mean Y") and higher-order agreement ("I agree that my previous disagreement was unwarranted") require belief extraction rather than simple turn-level metrics.
Inquiring lines that use this note as a source 44
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do LLMs fail inter-annotator agreement tests on argument evaluation?
- What causes silent agreement in multi-agent reasoning systems?
- Can LLMs serve as reliable intellectual opponents in serious debate or argument?
- Why does social accommodation in collaborative reasoning mask actual disagreement?
- How does silent agreement differ from collaborative reasoning collapse?
- Why does shared practice matter for meaning to take hold?
- Why do reasoning models perform poorly at theory of mind tasks?
- Why do passive conversational agents fail at collaborative decision-making?
- How often do AI agents reach false agreement in group reasoning tasks?
- How do LLMs currently fail at distinguishing genuine agreement from silent consensus?
- Why do LLM social behaviors undermine collaborative reasoning outcomes?
- Do models treat cooperative peers differently than uncooperative ones?
- Why do reasoning models perform worse on theory of mind tasks?
- What interaction controls matter most for effective human-LLM collaboration?
- Do parallel LLM workers coordinate emergently without predefined collaboration rules?
- Where do LLMs fail as knowledge systems compared to humans?
- Can training LLMs to form ad-hoc conventions improve their pragmatic reasoning?
- Does social integration of LLMs increase their capacity to influence technological futures?
- How might human-LLM teams reinforce each other's causal reasoning mistakes?
- Why do LLMs presume common ground instead of building it carefully?
- Can training procedures fix LLM accommodation of false presuppositions?
- How do different social roles affect LLM theory of mind errors?
- Why do LLMs struggle to update beliefs across multiple conversation turns?
- Why do LLMs presume common ground instead of building it?
- Do LLMs build common ground or assume it already exists?
- Why do LLMs systematically fail at information management in social interaction?
- What distinguishes models that refuse cooperation from those that fake alignment?
- How does accommodation differ from genuine belief change in listeners?
- What role does accommodation play in making discourse coherent?
- Can LLMs coordinate with humans better using different model architectures?
- What makes reasoning models worse at understanding people?
- Does community integration change LLM properties or only relational positioning?
- What makes social reasoning fundamentally different from mathematical reasoning?
- How does monological training versus dialogical interaction shape what models can do?
- How does collaboration itself become a degradation mechanism in reasoning tasks?
- How do LLMs mirror the same alliance failures as human counselors?
- Why might social reasoning work differently than formal logical reasoning?
- Can multi-agent debate prevent reasoning models from amplifying errors?
- What are the consequences of stacked accommodation biases in LLM predictions?
- Do multi-agent language model teams fail the same way individual reasoning does?
- What makes social reasoning fundamentally different from formal logical reasoning?
- At what complexity does LLM discourse failure become practically harmful?
- Can training alone produce genuine disagreement in collaborative LLM reasoning?
- Can reasoning training fix sycophancy if it is not a reasoning failure?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do multi-agent LLM systems converge without genuine deliberation?
Multi-agent reasoning systems are designed to improve answers through debate, but often agents simply agree with early confident claims rather than genuinely disagreeing. What drives this pattern and how common is it?
complementary failure mode; Coral measures capability degradation while silent agreement measures convergence failure
-
Does a model improve by arguing with itself?
When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?
third member of the agreement failure triad; self-revision vs convergence vs collaboration degradation
-
Why do AI systems agree when they should disagree?
When multi-agent AI systems are designed to improve through disagreement, why do they converge on consensus instead? What breaks the deliberation process?
Coral adds self-play preference data as a training-level fix distinct from architectural fixes
-
Can multiple agents stay diverse during training together?
Does training separate specialist agents on different data maintain the reasoning diversity that single-agent finetuning destroys? This matters because diversity correlates with accuracy and prevents models from becoming trapped in narrow response patterns.
Coral's self-play is complementary; diverse roles preserve diversity while self-play teaches assertiveness
-
Why do standard dialogue systems fail at tracking negotiation agreement?
Standard dialogue state tracking monitors one user's goals, but negotiation requires tracking both parties' evolving positions simultaneously. Why is this bilateral requirement fundamentally different, and what makes existing models insufficient?
Coral's >90% agreeableness regardless of correctness reveals that collaboration requires genuine bilateral commitment tracking, not just turn-level agreement detection; the agreement tracking framework from negotiation provides the infrastructure for detecting whether collaborative convergence is genuine or socially driven
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Collaborative Reasoner: Self-Improving Social Agents with Synthetic Conversations
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
- ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
- Learning to Learn from Language Feedback with Social Meta-Learning
- Scaling Behavior of Single LLM-Driven Multi-Agent Systems
- Large Language Model Reasoning Failures
- Cultural Evolution of Cooperation among LLM Agents
- Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Original note title
collaborative reasoning degrades below solo performance when llm social behaviors override correct individual reasoning