Can multi-agent teams automatically remove their weakest members?
Explores whether agents can score each other's contributions during problem-solving and use those scores to deactivate underperforming teammates in real time, improving overall team efficiency.
DyLAN (Dynamic LLM-Agent Network) introduces a systematic mechanism for multi-agent team optimization that addresses three properties simultaneously: task agnosticism, efficiency, and automatic team composition.
The core mechanism is the Agent Importance Score, computed through a three-step procedure:
- Propagation — each agent rates its predecessors on their solution quality
- Aggregation — for each agent, ratings from successors are compiled to quantify its contribution
- Selection — after summing ratings across all time steps, top-performing agents are retained and low-performing agents deactivated
This creates a dynamic interaction architecture: agents viewed as nodes in a network exchange messages as edges across time steps. An LLM-empowered ranker ranks agents at inference time and deactivates low-performing ones for subsequent rounds, while an early-stopping mechanism prevents unnecessary iterations.
The insight connects to multiple threads in multi-agent reasoning:
Since Why do multi-agent LLM systems converge without genuine deliberation?, DyLAN's contribution scoring provides a partial solution — agents that merely agree without adding information would receive low importance scores and get deactivated. This prevents the noise-amplification problem documented in When does debate actually improve reasoning accuracy?.
The approach contrasts with Can extreme task decomposition enable reliable execution at million-step scale? (MAKER), which uses static decomposition with voting. DyLAN dynamically prunes the agent network during execution — a more adaptive but less parallelizable strategy. The trade-off maps onto How should we balance parallel versus sequential compute at test time?: static decomposition enables parallelism while dynamic selection enables adaptation.
The Agent Importance Score also provides a concrete implementation of the "contribution-based routing" that Can AI systems detect when they've genuinely reached agreement? advocates — but generalized beyond agreement detection to overall contribution quantification.
AgentVerse four-stage dynamic group adjustment (from Arxiv/Agents Multi): AgentVerse extends the dynamic team composition principle with a four-stage group problem-solving process that mirrors human group dynamics: (1) Expert Recruitment — dynamically adjusting team composition based on current problem-solving progress; (2) Collaborative Decision-Making — recruited agents discuss and formulate strategies until consensus; (3) Action Execution — agents interact with the environment to execute agreed actions; (4) Evaluation — comparing current state to desired goal, with feedback reward looping back to stage 1 for team re-composition. Unlike DyLAN's contribution scoring which prunes within a fixed network, AgentVerse's recruitment stage can introduce new agent profiles not in the original team. The evaluation-to-recruitment feedback loop enables adaptive team evolution over the course of problem-solving — the team that finishes may differ substantially from the team that started.
MasRouter's four-decision MASR framework (from Arxiv/Routers): MasRouter formalizes multi-agent system routing as four simultaneous decisions: collaboration topology, agent count, role allocation, and per-agent LLM selection. This reveals that DyLAN's contribution-based agent selection addresses only runtime optimization within an already-constructed network. MasRouter constructs the network itself — choosing topology, roles, and LLM assignments from scratch via a cascaded variational-probabilistic-multinomial controller. The two approaches are complementary: MasRouter for initial construction (design-time routing), DyLAN for runtime adaptation (inference-time pruning). Composing them would create a system that starts with an optimal network configuration AND adapts it during execution. See What decisions must multi-agent routing systems optimize simultaneously?.
Inquiring lines that use this note as a source 32
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can agreement detection agents improve multi-agent deliberation beyond just negotiation?
- How do goal representations differ between human and AI teams?
- Why does diversity without expertise produce worse results than a single capable agent?
- Can designated leadership structures reduce premature convergence in multi-agent reasoning?
- How do agreement-detection agents improve distributed coordination outcomes?
- How do multi-agent systems improve on single frontier models?
- How do static team decomposition and dynamic agent selection compare in efficiency?
- What role should agreement detection play in improving multi-agent team performance?
- Can silent agreement be prevented in multi-agent reasoning systems?
- How can humans oversee multiple partial-progress agents simultaneously?
- Can cooperative AI systems make meaningful decisions without a stable self?
- When does multi-agent voting help versus hurt performance on tasks?
- Can voting work at every level of task decomposition, not just whole problems?
- Why does literature review benefit most from multi-agent orchestration approaches?
- Which failure mode most limits current multi-agent performance?
- How does role specialization preserve reasoning diversity in multi-agent teams?
- Can cognitive diversity overcome expertise gaps in agent teams?
- Can cognitive diversity compensate for lack of expertise in agent teams?
- What coordination failures emerge when multiple agents work together?
- How do agents decide when to abstain from contributing?
- What capability threshold do agents need to self-organize effectively?
- Does horizontal coordination improve with stronger individual agents?
- How can AI improve the peer review bottleneck without replacing reviewers?
- At what capability threshold does multi-agent coordination stop helping?
- How do agent capabilities change across 25 relay rounds of interaction?
- How do evaluation methods differ for single versus multi-agent systems?
- How should benchmarks measure agent efficiency across all three cost dimensions?
- How do capability vectors enable discovery in multi-agent systems?
- How does the Catfish Agent intervention reduce premature consensus in multi-agent systems?
- Can multi-agent teams solve problems better than single models thinking longer?
- How should experiment budgets be allocated across parallel hypothesis-testing teams?
- When does multi-agent scaling actually outperform static ensembles?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do multi-agent LLM systems converge without genuine deliberation?
Multi-agent reasoning systems are designed to improve answers through debate, but often agents simply agree with early confident claims rather than genuinely disagreeing. What drives this pattern and how common is it?
the problem DyLAN partially addresses: uninformative agents get deactivated
-
Can extreme task decomposition enable reliable execution at million-step scale?
Can breaking tasks into maximally atomic subtasks with voting-based error correction solve the fundamental reliability problem in long-horizon tasks? This challenges whether better models or better decomposition is the path to high-reliability AI systems.
contrasting approach: static decomposition vs dynamic pruning
-
Can AI systems detect when they've genuinely reached agreement?
When multiple AI agents debate, they often converge without actually deliberating. Can a dedicated agent reliably identify true agreement versus false consensus, and would that improve debate outcomes?
agreement detection as a special case of contribution scoring
-
When does debate actually improve reasoning accuracy?
Multi-agent debate shows promise for reasoning tasks, but under what conditions does it help versus hurt? The research explores whether debate amplifies errors when evidence verification is missing.
deactivating low-quality agents could reduce error amplification
-
What decisions must multi-agent routing systems optimize simultaneously?
Standard LLM routing only picks which model to use. But multi-agent systems involve four interdependent choices: topology, agent count, role assignment, and per-agent model selection. Does optimizing all four together actually improve performance?
MasRouter: design-time construction of the network DyLAN then prunes at runtime
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
- Towards a Science of Scaling Agent Systems
- How we built our multi-agent research system
- Learning "Partner-Aware" Collaborators in Multi-Party Collaboration
- ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
- AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
- A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions
- ProAgent: Building Proactive Cooperative Agents with Large Language Models
Original note title
dynamic inference-time agent selection via contribution scoring deactivates low-performing agents and optimizes team composition