TOPIC

Model Routers

4 synthesis notes · 13 source papers

View as

Can routing queries to task-matched structures improve RAG reasoning?

Does matching retrieval structure type to task demands—tables for analysis, graphs for inference, algorithms for planning—improve reasoning accuracy over uniform chunk retrieval? This explores whether cognitive fit principles from human learning transfer to AI systems.

Can routers select the right model before generation happens?

Explores whether LLMs can be matched to queries by estimating difficulty upfront, before any generation begins. This matters because routing could cut costs significantly while preserving response quality.

What decisions must multi-agent routing systems optimize simultaneously?

Standard LLM routing only picks which model to use. But multi-agent systems involve four interdependent choices: topology, agent count, role assignment, and per-agent model selection. Does optimizing all four together actually improve performance?

Can routing beat building one better model?

Does directing queries to specialized models via semantic clustering outperform investing in a single frontier model? This challenges whether model improvement or model selection drives performance gains.

Source papers 13

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.

Adapter-based Selective Knowledge Distillation for Federated Multi-domain Meeting Summarization
Xiachong Feng, Xiaocheng Feng, Xiyuan Du, Min-Yen Kan, Bing Qin [https://arxiv.org/abs/2308.03275](https://arxiv.org/abs/2308.03275) [[Routers]] [[Arxiv/Agents Multi|Agents Multi]] [[Reading Summari…
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Large-language models (LLMs) have demonstrated powerful problem-solving capabilities, in particular when organized in multi-agent systems. However, the advent of such systems also raises several quest…
Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing
Balancing performance and efficiency is a central challenge in large language model (LLM) advancement. GPT-5 addresses this with test-time routing, dynamically assigning queries to either an efficient…
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
Large language model (LLM) agents are increasingly built less by changing model weights than by reorganizing the runtime around them. Capabilities that earlier systems expected the model to recover in…
Fast, Slow, and Tool-augmented Thinking for LLMs: A Review
Large Language Models (LLMs) have demonstrated remarkable progress in reasoning across diverse domains. However, effective reasoning in real-world tasks requires adapting the reasoning strategy to the…
Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI
We present Federation of Agents (FoA), a distributed orchestration framework that transforms static multi-agent coordination into dynamic, capability-driven collaboration. FoA introduces Versioned Cap…
Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models
We present Quasar-1, a novel architecture that introduces temperature-guided reasoning to large language models through the Token Temperature Mechanism (TTM) and Guided Sequence of Thought (GSoT). Our…
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Large language models (LLMs) excel in most NLP tasks but also require expensive cloud servers for deployment due to their size, while smaller models that can be deployed on lower cost (e.g., edge) dev…
MasRouter: Learning to Route LLMs for Multi-Agent Systems
Multi-agent systems (MAS) powered by Large Language Models (LLMs) have been demonstrated to push the boundaries of LLM capabilities, yet they often incur significant costs and face challenges in dynam…
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
We propose MODEL SWARMS, a collaborative search algorithm to adapt LLMs via swarm intelligence, the collective behavior guiding individual systems. Specifically, MODEL SWARMS starts with a pool of LLM…
RouteLLM: Learning to Route LLMs with Preference Data
Large language models (LLMs) excel at a wide range of tasks, but choosing the right model often involves balancing performance and cost. Powerful models offer better results but are expensive, while s…
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs) in many knowledge-based tasks. However, existing RAG methods struggle with knowledge-intensive r…
Why Do Multi-agent LLM Systems Fail?
[[Routers]] Despite growing enthusiasm for Multi-Agent LLM Systems (MAS), their performance gains across popular benchmarks often remain minimal compared to single-agent frameworks. This gap highlig…