Can dialogue format help models reason more diversely?

Explores whether structuring internal reasoning as multi-agent dialogue rather than monologue can improve strategy diversity and coherency across different problem types, using the Compound-QA benchmark.

Synthesis note · 2026-02-22 · sourced from Conversation Architecture Structure

Current reasoning models (o1, R1, DeepSeek) use monologue-style reasoning within a think block: a single continuous chain of internal text. DialogueReason identifies two systematic weaknesses in this approach:

Low diversity — models persistently apply fixed strategies across diverse problems. When problems require different approaches (BFS for combinatorial, DFS for geometric proofs), monologue reasoning recycles the same strategy.

Low coherency — frequent shifts in attention within a single reasoning path. Repetitive hesitations ("Wait..."), unnecessary switches between ideas. The reasoning becomes fragmented, difficult to interpret, and often ineffective — swinging between overcommitting to one strategy and neglecting alternatives.

The Compound-QA task makes this visible: concatenating multiple independently solvable problems into a single prompt forces the model to demonstrate both diverse strategies and maintained coherency. Monologue reasoning fails at exactly this combination.

DialogueReason proposes dialogue-based internal reasoning structured through three dimensions:

Agent dimension: multiple reasoning agents with designated characters, objectives, and interests
Environment dimension: recording task progression, introducing events, maintaining task control
Interaction dimension: agent-to-agent (conflict resolution, negotiation, supplementation) and agent-to-environment (requirements and feedback)

The mechanism is scene-switching: the model sets up a dedicated scene for each question ("Quantum Café"), introduces characters with distinct expertise, and resolves through dialogue. When transitioning to the next question, it constructs a new environment ("Theoretical Physics Hall") with different characters. This prevents cross-problem interference while maintaining per-problem coherency.

This is distinct from multi-agent debate systems, which use SEPARATE models. DialogueReason is a SINGLE model that reasons in dialogue format — the diversity comes from internal role differentiation, not from aggregating multiple independent models. Since Why does parallel reasoning outperform single chain thinking?, DialogueReason achieves a related advantage through a different mechanism: not multiple parallel chains, but structured internal dialogue that naturally explores multiple strategies.

The connection to reasoning format effects is direct: since Does training data format shape reasoning strategy more than domain?, having the model reason in dialogue format activates different reasoning strategies than monologue format — the format IS the intervention.

Inquiring lines that use this note as a source 48

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 6

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

16 direct connections · 174 in 2-hop network ·dense cluster Open in graph ↗

Can dialogue format help models reason more dive… Why does parallel reasoning outperform single chai… Does a model improve by arguing with itself? Does training data format shape reasoning strategy… Can reasoning topologies be formally classified as… When does debate actually improve reasoning accura… Why do multi-agent LLM systems converge without ge…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why does parallel reasoning outperform single chain thinking? Does dividing a fixed token budget across multiple independent reasoning paths beat spending it all on one long chain? This explores how breadth and diversity in reasoning compare to depth.
DialogueReason achieves diversity through internal dialogue rather than external parallelism
Does a model improve by arguing with itself? When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?
DialogueReason addresses the single-model limitation via internal multi-agent simulation
Does training data format shape reasoning strategy more than domain? What explains why models trained on multiple-choice data reason differently than those trained on free-form text? The research isolates format and domain effects to measure which one matters more.
dialogue format shapes reasoning strategy just as MC vs FF format does
Can reasoning topologies be formally classified as graph types? This explores whether Chain of Thought, Tree of Thought, and Graph of Thought represent distinct formal graph structures with different computational properties. Understanding this matters because the topology itself determines what reasoning strategies are possible.
DialogueReason adds dialogue as a distinct reasoning topology
When does debate actually improve reasoning accuracy? Multi-agent debate shows promise for reasoning tasks, but under what conditions does it help versus hurt? The research explores whether debate amplifies errors when evidence verification is missing.
DialogueReason achieves multi-agent diversity benefits within a SINGLE model through internal dialogue, avoiding the persuasion-over-truth risk of actual multi-agent debate; the scene-switching mechanism prevents cross-problem interference while maintaining per-problem diversity — a structural advantage over multi-instance debate where rhetorical framing can override evidence
Why do multi-agent LLM systems converge without genuine deliberation? Multi-agent reasoning systems are designed to improve answers through debate, but often agents simply agree with early confident claims rather than genuinely disagreeing. What drives this pattern and how common is it?
DialogueReason's internal agent differentiation within a single model may avoid the social accommodation dynamic that drives silent agreement in true multi-agent systems, because the "agents" share a single model's parameters rather than exhibiting the independent accommodation tendencies of separate model instances

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

dialogue-based reasoning outperforms monologue reasoning on diversity and coherency by structuring internal thought as multi-agent interaction within defined scenes

Can dialogue format help models reason more diversely?

Related concepts in this collection 6

Related papers in this collection 8

Search by related questions 4