Can AI systems design unique multi-agent workflows per individual query?
Explores whether meta-agents trained with reinforcement learning can automatically generate personalized multi-agent system architectures tailored to individual user queries, rather than applying fixed task-level templates uniformly.
Previous approaches to automating multi-agent system design operate at the task level: design one workflow for "code generation tasks," another for "summarization tasks," and apply each uniformly to all queries of that type. FlowReasoner (2025) shifts this to the query level — generating a unique multi-agent system for each individual user query.
The architecture has two phases. First, distill from DeepSeek R1 to give the meta-agent basic reasoning about how to design multi-agent workflows. Then enhance via RL with external execution feedback — the meta-agent generates a multi-agent system, that system runs on the query, and the execution result provides reward signal. A multi-purpose reward guides training across three dimensions: performance (did it work), complexity (how many agents and steps), and efficiency (how much compute).
This matters because one-size-fits-all multi-agent systems lack the capability for automatic adaptation to individual queries. A code generation task where the user wants "build a 2048 game" needs a fundamentally different agent composition than "fix a sorting bug." The query-level approach treats multi-agent architecture design as itself a reasoning problem amenable to RL.
The progression is notable: manual design → fixed template optimization → graph-based workflow search → code-based meta-agents → RL-trained query-level meta-agents. Each step automates one more degree of freedom. The connection to Can we automatically optimize both prompts and agent coordination? is direct: FlowReasoner represents multi-agent systems as code and optimizes them, but at the individual query level rather than the task level.
Since Can computational power accelerate scientific discovery itself?, the RL-trained meta-agent approach may follow similar scaling dynamics — more compute for the meta-agent should yield better per-query system designs.
MasRouter as structured alternative (from Arxiv/Routers): MasRouter provides a more constrained approach to per-query MAS design than FlowReasoner. Where FlowReasoner generates arbitrary multi-agent systems via RL-trained code generation (maximum flexibility, less interpretability), MasRouter uses a cascaded controller: variational latent variable model for topology selection → structured probabilistic cascade for role allocation → multinomial distribution for LLM routing. The cascade provides interpretable intermediate decisions at the cost of a fixed structure-type vocabulary (Chain/Tree/Graph topologies, predefined role categories). FlowReasoner trades interpretability for expressiveness; MasRouter trades expressiveness for interpretability and likely faster convergence. Both achieve per-query optimization. See What decisions must multi-agent routing systems optimize simultaneously?.
Inquiring lines that use this note as a source 24
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Do explicit reward structures enable AI agent cooperation that open-ended interaction cannot?
- What makes personas in multi-agent systems actually contribute meaningful domain depth?
- How does open-ended evolver reasoning identify patterns across heterogeneous user trajectories?
- Can multi-agent reasoning systems scale beyond current architectures?
- How does modularity in reward and policy design enable goal generalization?
- Does parallel task structure determine optimal multi-agent architecture?
- Can construction-time routing and runtime agent pruning be combined effectively?
- How do cascaded probabilistic models compare to reinforcement learning for per-query system design?
- How should CASA theory be updated for modern personalized agents?
- Can programmatic meta-reasoning rewards operationalize agentic process supervision?
- How do multi-agent routers balance flexibility against interpretability in design?
- Can RL-trained meta-agents match or exceed manually designed workflows?
- Could AI agents scale the friend-with-different-preferences recommendation mechanism?
- Can users adapt their competencies to match how AI actually operates?
- Can agentic AI tools deliver productivity gains on learning tasks differently?
- Why do production AI agents deliberately stay simple and avoid frameworks?
- How do human-agent systems incorporate diverse feedback into model behavior?
- How do agents automatically generate suitable learning tasks based on current capability?
- How do strategy-level abstractions differ from storing raw task workflows?
- Can personalized AI learning systems actually widen rather than narrow educational gaps?
- Can we design efficient agents by targeting constraints directly?
- Should optimal context budgets scale with agent competence or task complexity?
- Which agent architectures consistently outperform base models on hard prediction questions?
- Should we train the evolver or the executor when building self-improving agents?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can we automatically optimize both prompts and agent coordination?
This explores whether language agents can be represented as computational graphs whose structure and content adapt automatically. Why it matters: current agent systems require hand-engineered orchestration; automatic optimization could unlock more capable multi-agent systems.
the graph formalism this extends to query-level
-
Can computational power accelerate scientific discovery itself?
Does the pace of research breakthroughs scale with computing resources, like model performance does? ASI-ARCH tested this by running thousands of autonomous experiments to discover neural architectures.
scaling dynamics for architecture search
-
Can multi-agent teams automatically remove their weakest members?
Explores whether agents can score each other's contributions during problem-solving and use those scores to deactivate underperforming teammates in real time, improving overall team efficiency.
inference-time team optimization; FlowReasoner does this at design time
-
Can extreme task decomposition enable reliable execution at million-step scale?
Can breaking tasks into maximally atomic subtasks with voting-based error correction solve the fundamental reliability problem in long-horizon tasks? This challenges whether better models or better decomposition is the path to high-reliability AI systems.
alternative approach: fixed decomposition vs adaptive design
-
What decisions must multi-agent routing systems optimize simultaneously?
Standard LLM routing only picks which model to use. But multi-agent systems involve four interdependent choices: topology, agent count, role assignment, and per-agent model selection. Does optimizing all four together actually improve performance?
MasRouter: more constrained per-query design with interpretable cascade
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- FlowReasoner: Reinforcing Query-Level Meta-Agents
- The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
- Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
- Adaptation of Agentic AI
- Real-Time Procedural Learning From Experience for AI Agents
- Intelligent AI Delegation
- MetaClaw: Just Talk — An Agent That Meta-Learns and Evolves in the Wild
- How we built our multi-agent research system
Original note title
query-level meta-agents generate personalized multi-agent systems per user query via RL with execution feedback