Scaling Behavior of Single LLM-Driven Multi-Agent Systems

Paper · arXiv 2606.00655 · Published May 30, 2026

The burgeoning field of LLM-based Multi- Agent Systems (MAS) promises to tackle complex tasks through collaborative intelligence, yet fundamental questions regarding their scaling behavior and intrinsic collective dynamics remain underexplored. This paper systematically investigates how the performance of a homogeneous MAS evolves as the number of agents increases, isolating the variable of collaboration from model or knowledge heterogeneity. We propose the Sequential Iterative Multi-Agent System (SIMAS) framework, a minimalist architecture centered on sequential inter-agent communication, to clearly observe scaling effects. Through extensive experiments across diverse tasks and model scales, we establish that MAS performance does not scale monotonically with agent count but follows a pattern of diminishing returns, governed by a trade-off between collaborative synergy and coordination overhead. Our findings reveal that effective MAS requires a sufficiently capable base LLM, that task type critically modulates the optimal agent count, and that collective intelligence is an emergent property contingent on strategic interaction design rather than a guaranteed outcome of agent plurality.

Introduction. In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in text generation, complex reasoning, and decisionmaking, establishing themselves as the core foundation for constructing intelligent systems, often referred to as "agents". Though individual agent has exhibited excellent problem-solving ability in expansive fields with performance enhanced—for instance, through Chain-of-Thought (CoT) prompting (Wei et al., 2022) to elicit step-by-step reasoning, or by enabling models to leverage external APIs via frameworks like Toolformer (Schick et al., 2023), however, many real-world challenges, such as sophisticated software development or multifaceted problem-solving, inherently require collaborative efforts. This necessity has driven the emergence of LLM-based Multi-Agent Systems (MAS), a field that has rapidly evolved from early exploratory frameworks to complex systems (Xi et al., 2023; Luo et al., 2025), where multiple agents interact to achieve common goals.

Discussion / Conclusion. This work systematically demonstrates that LLMbased Multi-Agent System performance does not scale linearly with agent count but exhibits a pattern of diminishing returns. Effective collaboration first requires a sufficiently capable base LLM. The optimal number of agents is a critical design parameter, heavily dependent on task type and model architecture, balancing synergy against overhead. The performance degradations derived from collaboration overhead generalizes across interaction architectures. Crucially, collective intelligence is not an automatic outcome of adding agents but an emergent property contingent on deliberate interaction design. Without architectural support for synthesis and refinement, multi-agent dialogue risks inefficiency. Future MAS development must therefore prioritize designing adaptive, task-aware collaboration protocols over simply increasing agent plurality.

Scaling Behavior of Single LLM-Driven Multi-Agent Systems

Synthesis notes that discuss concepts related to this paper