SYNTHESIS NOTE

How can LLM agents handle huge candidate lists without breaking?

ReAct agents fail when retrieval tools return hundreds of items that overflow prompts. What architectural changes let LLMs work effectively with large candidate sets in recommendation systems?

Synthesis note · 2026-05-03 · sourced from Recommenders Conversational

The standard pattern for tool-using LLM agents is ReAct: at each step, the LLM reasons, takes an action via tool call, observes the result, and reasons again. This works when tool outputs are small. In recommender settings, retrieval tools return hundreds or thousands of candidate items — too many to fit in an observation prompt, and including the entity names degrades LLM performance.

InteRecAgent introduces two architectural fixes. First, a Candidate Bus — a separate memory accessible to all tools that holds the current candidate set without putting it in the prompt. Tools read candidates from the bus, filter, and write the filtered set back. Items flow through tools in a streaming funnel — query tool sets initial candidates, retrieval tool narrows them, ranker tool orders the survivors — without any step's output bloating the LLM's context window.

Second, plan-first execution replaces step-by-step ReAct. Instead of generating one action at a time, the LLM generates the entire tool-call sequence at once based on the user's intent, then executes it in order. This both reduces LLM inference cost (one planning call instead of N) and reduces error rates because the LLM reasons globally about the sequence. A separate "critic" LLM evaluates execution and triggers reflection if results are unsatisfactory.

The framework also distinguishes hard conditions ("popular sports games under $100" — handled by SQL queries) from soft conditions ("similar to Call of Duty" — handled by item-to-item embedding match), routing each through the appropriate tool. Long-term and short-term user profiles maintained outside the LLM's context window enable lifelong conversations without context overflow.

The general principle: when LLM agent patterns from research (ReAct, step-by-step CoT) hit production constraints (large candidate sets, long conversations, latency), the answer isn't a smarter prompt but architectural changes that move state out of the prompt entirely.

Inquiring lines that use this note as a source 1

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

What components must wrap an LLM to build a working CRS?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 118 in 2-hop network ·medium cluster Open in graph ↗

How can LLM agents handle huge candidate lists w… How should LLM-based recommenders retrieve from ma… Why do protocol-based tool integrations fail in pr… Does structured artifact sharing outperform conver… Can we automatically optimize both prompts and age…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

How should LLM-based recommenders retrieve from massive item corpora? When conversational recommenders need to search millions of items, the LLM cannot memorize the corpus. What retrieval strategies work best under different constraints, and how do they trade off latency, sample efficiency, and scalability?
complements: candidate-bus is the architectural complement to retrieval-strategy choice — the bus carries what the chosen strategy returns
Why do protocol-based tool integrations fail in production workflows? Explores whether standardized tool protocols like MCP introduce non-determinism that undermines agent reliability, and what causes ambiguous tool selection in production systems.
complements: plan-first execution is the deterministic-call pattern in recommender setting — same anti-ReAct lesson
Does structured artifact sharing outperform conversational coordination? Explores whether agents coordinating through standardized documents rather than natural language messages achieve better collaboration outcomes. Matters because it challenges the default conversational paradigm in multi-agent system design.
complements: candidate bus is a standardized artifact between tools — same SOP-over-natural-language coordination lesson at smaller scale
Can we automatically optimize both prompts and agent coordination? This explores whether language agents can be represented as computational graphs whose structure and content adapt automatically. Why it matters: current agent systems require hand-engineered orchestration; automatic optimization could unlock more capable multi-agent systems.
extends: candidate bus + plan are graph-orchestration primitives — InteRecAgent is one graph instantiation

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

LLM-as-recommender requires plan-first execution and a candidate bus to overcome step-by-step ReAct limitations on long item lists

How can LLM agents handle huge candidate lists without breaking?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4