SYNTHESIS NOTE

When do multi-agent systems actually outperform single agents?

As individual LLMs grow more capable, does the advantage of splitting work across multiple agents still hold? This explores when coordination overhead makes MAS counterproductive.

Synthesis note · 2026-03-28 · sourced from Agentic Research

"Single-agent or Multi-agent Systems? Why Not Both?" (2025) provides an empirical and theoretical analysis of when multi-agent systems (MAS) help versus hurt, with a finding that challenges the default toward multi-agent architectures.

The diminishing advantage. Prior studies reported MAS accuracy superiority across diverse domains. However, as frontier LLMs rapidly advance in long-context reasoning, memory retention, and tool usage, many limitations that originally motivated MAS designs are being mitigated by single-agent capability improvements. The empirical study finds that across various agentic applications, the performance gap between MAS and SAS narrows with stronger models — and SAS consistently outperforms MAS in a substantial portion of cases.

Three MAS defect types formalized as dependency graph problems:

Node-level defect: Both MAS and SAS performance are bottlenecked by the critical agent responsible for the most difficult subtask. MAS cannot escape the ceiling set by its weakest critical component. Adding more agents does not help if the hardest subtask remains unsolved.

Edge-level defect: Downstream agents become overwhelmed by inputs from upstream agents. In multi-way conversations or prolonged iterative refinements, high in-degree nodes (summarizers, synthesizers) receive more information than they can process effectively, leading to overthinking on edge cases. This is "analogous to the overthinking of the reasoning model, but rather than being lost in thinking, the agent becomes overwhelmed by inputs from upstream agents." MAS aggravates the problem because agents process much more data.

Path-level defect: Indecisive errors propagate through chains of agent interactions. Crucial context is lost or diluted when intermediate outputs are summarized or filtered. Even small information loss causes irreversible errors downstream via snowball effects. The specific failure mode: correct solutions proposed in earlier rounds get lost during summarization before reaching the next agent — "this loss is unrecoverable, as downstream agents no longer have access to the full previous results."

The hybrid solution. Confidence-guided routing between SAS and MAS — request cascading — selectively offloads requests based on difficulty. The approach improves accuracy by 1.1-12% while reducing costs up to 88%. AIME (hardest math) is the exception where MAS consistently outperforms, illustrating MAS value for extremely difficult tasks.

This extends When does adding more agents actually help systems?: the scaling laws quantify MAS overhead, while this paper shows the overhead becoming less worthwhile as single-agent capability increases. Since Why do multi-agent LLM systems converge without genuine deliberation?, MAS suffers from both coordination overhead AND pseudo-agreement — making the case for SAS with selective MAS escalation.

Inquiring lines that use this note as a source 24

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

16 direct connections · 150 in 2-hop network ·dense cluster Open in graph ↗

When do multi-agent systems actually outperform … When does adding more agents actually help systems… Why do multi-agent LLM systems converge without ge… Does token spending drive multi-agent research per… Does more thinking time always improve reasoning a…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

When does adding more agents actually help systems? Multi-agent systems often fail in practice, but the reasons remain unclear. This research investigates whether coordination overhead, task properties, or system architecture determine when agents improve or degrade performance.
quantifies the overhead; this paper shows it becoming less worthwhile
Why do multi-agent LLM systems converge without genuine deliberation? Multi-agent reasoning systems are designed to improve answers through debate, but often agents simply agree with early confident claims rather than genuinely disagreeing. What drives this pattern and how common is it?
MAS suffers coordination overhead AND pseudo-agreement
Does token spending drive multi-agent research performance? Multi-agent systems outperform single agents substantially, but what actually accounts for that improvement? Is it intelligent coordination or simply spending more tokens on the same task?
if tokens drive performance, a single capable model may be more efficient than many smaller ones
Does more thinking time always improve reasoning accuracy? Explores whether extending a model's thinking tokens linearly improves performance, or if there's a point beyond which additional reasoning becomes counterproductive.
edge-level defect is external-input-induced overthinking paralleling internal overthinking

When do multi-agent systems actually outperform single agents?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4