How can LLM agents handle huge candidate lists without breaking?
ReAct agents fail when retrieval tools return hundreds of items that overflow prompts. What architectural changes let LLMs work effectively with large candidate sets in recommendation systems?
The standard pattern for tool-using LLM agents is ReAct: at each step, the LLM reasons, takes an action via tool call, observes the result, and reasons again. This works when tool outputs are small. In recommender settings, retrieval tools return hundreds or thousands of candidate items — too many to fit in an observation prompt, and including the entity names degrades LLM performance.
InteRecAgent introduces two architectural fixes. First, a Candidate Bus — a separate memory accessible to all tools that holds the current candidate set without putting it in the prompt. Tools read candidates from the bus, filter, and write the filtered set back. Items flow through tools in a streaming funnel — query tool sets initial candidates, retrieval tool narrows them, ranker tool orders the survivors — without any step's output bloating the LLM's context window.
Second, plan-first execution replaces step-by-step ReAct. Instead of generating one action at a time, the LLM generates the entire tool-call sequence at once based on the user's intent, then executes it in order. This both reduces LLM inference cost (one planning call instead of N) and reduces error rates because the LLM reasons globally about the sequence. A separate "critic" LLM evaluates execution and triggers reflection if results are unsatisfactory.
The framework also distinguishes hard conditions ("popular sports games under $100" — handled by SQL queries) from soft conditions ("similar to Call of Duty" — handled by item-to-item embedding match), routing each through the appropriate tool. Long-term and short-term user profiles maintained outside the LLM's context window enable lifelong conversations without context overflow.
The general principle: when LLM agent patterns from research (ReAct, step-by-step CoT) hit production constraints (large candidate sets, long conversations, latency), the answer isn't a smarter prompt but architectural changes that move state out of the prompt entirely.
Inquiring lines that use this note as a source 1
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
How should LLM-based recommenders retrieve from massive item corpora?
When conversational recommenders need to search millions of items, the LLM cannot memorize the corpus. What retrieval strategies work best under different constraints, and how do they trade off latency, sample efficiency, and scalability?
complements: candidate-bus is the architectural complement to retrieval-strategy choice — the bus carries what the chosen strategy returns
-
Why do protocol-based tool integrations fail in production workflows?
Explores whether standardized tool protocols like MCP introduce non-determinism that undermines agent reliability, and what causes ambiguous tool selection in production systems.
complements: plan-first execution is the deterministic-call pattern in recommender setting — same anti-ReAct lesson
-
Does structured artifact sharing outperform conversational coordination?
Explores whether agents coordinating through standardized documents rather than natural language messages achieve better collaboration outcomes. Matters because it challenges the default conversational paradigm in multi-agent system design.
complements: candidate bus is a standardized artifact between tools — same SOP-over-natural-language coordination lesson at smaller scale
-
Can we automatically optimize both prompts and agent coordination?
This explores whether language agents can be represented as computational graphs whose structure and content adapt automatically. Why it matters: current agent systems require hand-engineered orchestration; automatic optimization could unlock more capable multi-agent systems.
extends: candidate bus + plan are graph-orchestration primitives — InteRecAgent is one graph instantiation
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
- From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models
- Learning Agent-Compatible Context Management for Long-Horizon Tasks
- TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation
- VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild
- A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
- MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch
- Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
Original note title
LLM-as-recommender requires plan-first execution and a candidate bus to overcome step-by-step ReAct limitations on long item lists