SYNTHESIS NOTE
Agentic Systems and Tool Use

When do multi-agent systems actually outperform single agents?

As individual LLMs grow more capable, does the advantage of splitting work across multiple agents still hold? This explores when coordination overhead makes MAS counterproductive.

Synthesis note · 2026-03-28 · sourced from Agentic Research
What makes multi-agent teams actually perform better? How does test-time scaling work at the agent level?

"Single-agent or Multi-agent Systems? Why Not Both?" (2025) provides an empirical and theoretical analysis of when multi-agent systems (MAS) help versus hurt, with a finding that challenges the default toward multi-agent architectures.

The diminishing advantage. Prior studies reported MAS accuracy superiority across diverse domains. However, as frontier LLMs rapidly advance in long-context reasoning, memory retention, and tool usage, many limitations that originally motivated MAS designs are being mitigated by single-agent capability improvements. The empirical study finds that across various agentic applications, the performance gap between MAS and SAS narrows with stronger models — and SAS consistently outperforms MAS in a substantial portion of cases.

Three MAS defect types formalized as dependency graph problems:

Node-level defect: Both MAS and SAS performance are bottlenecked by the critical agent responsible for the most difficult subtask. MAS cannot escape the ceiling set by its weakest critical component. Adding more agents does not help if the hardest subtask remains unsolved.

Edge-level defect: Downstream agents become overwhelmed by inputs from upstream agents. In multi-way conversations or prolonged iterative refinements, high in-degree nodes (summarizers, synthesizers) receive more information than they can process effectively, leading to overthinking on edge cases. This is "analogous to the overthinking of the reasoning model, but rather than being lost in thinking, the agent becomes overwhelmed by inputs from upstream agents." MAS aggravates the problem because agents process much more data.

Path-level defect: Indecisive errors propagate through chains of agent interactions. Crucial context is lost or diluted when intermediate outputs are summarized or filtered. Even small information loss causes irreversible errors downstream via snowball effects. The specific failure mode: correct solutions proposed in earlier rounds get lost during summarization before reaching the next agent — "this loss is unrecoverable, as downstream agents no longer have access to the full previous results."

The hybrid solution. Confidence-guided routing between SAS and MAS — request cascading — selectively offloads requests based on difficulty. The approach improves accuracy by 1.1-12% while reducing costs up to 88%. AIME (hardest math) is the exception where MAS consistently outperforms, illustrating MAS value for extremely difficult tasks.

This extends When does adding more agents actually help systems?: the scaling laws quantify MAS overhead, while this paper shows the overhead becoming less worthwhile as single-agent capability increases. Since Why do multi-agent LLM systems converge without genuine deliberation?, MAS suffers from both coordination overhead AND pseudo-agreement — making the case for SAS with selective MAS escalation.

Inquiring lines that use this note as a source 24

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
16 direct connections · 150 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

multi-agent system advantages diminish as single-agent LLM capabilities improve — three defect types in MAS dependency graphs explain when single beats multi