SYNTHESIS NOTE
Agentic Systems and Tool Use

How can LLM agents handle huge candidate lists without breaking?

ReAct agents fail when retrieval tools return hundreds of items that overflow prompts. What architectural changes let LLMs work effectively with large candidate sets in recommendation systems?

Synthesis note · 2026-05-03 · sourced from Recommenders Conversational
What breaks when specialized AI models reach real users? Why do multi-agent systems fail despite individual capability?

The standard pattern for tool-using LLM agents is ReAct: at each step, the LLM reasons, takes an action via tool call, observes the result, and reasons again. This works when tool outputs are small. In recommender settings, retrieval tools return hundreds or thousands of candidate items — too many to fit in an observation prompt, and including the entity names degrades LLM performance.

InteRecAgent introduces two architectural fixes. First, a Candidate Bus — a separate memory accessible to all tools that holds the current candidate set without putting it in the prompt. Tools read candidates from the bus, filter, and write the filtered set back. Items flow through tools in a streaming funnel — query tool sets initial candidates, retrieval tool narrows them, ranker tool orders the survivors — without any step's output bloating the LLM's context window.

Second, plan-first execution replaces step-by-step ReAct. Instead of generating one action at a time, the LLM generates the entire tool-call sequence at once based on the user's intent, then executes it in order. This both reduces LLM inference cost (one planning call instead of N) and reduces error rates because the LLM reasons globally about the sequence. A separate "critic" LLM evaluates execution and triggers reflection if results are unsatisfactory.

The framework also distinguishes hard conditions ("popular sports games under $100" — handled by SQL queries) from soft conditions ("similar to Call of Duty" — handled by item-to-item embedding match), routing each through the appropriate tool. Long-term and short-term user profiles maintained outside the LLM's context window enable lifelong conversations without context overflow.

The general principle: when LLM agent patterns from research (ReAct, step-by-step CoT) hit production constraints (large candidate sets, long conversations, latency), the answer isn't a smarter prompt but architectural changes that move state out of the prompt entirely.

Inquiring lines that use this note as a source 1

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 118 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

LLM-as-recommender requires plan-first execution and a candidate bus to overcome step-by-step ReAct limitations on long item lists