Can LLMs recommend products without ever seeing the catalog?
Explores whether language models can learn to generate effective search queries for recommendation systems without direct access to inventory data. This challenges the intuition that good recommendations require knowing what items exist.
A counterintuitive empirical finding from Rec-R1's product search experiments. The trained LLM never sees the downstream item catalog. It receives a user query and generates a rewritten query, without knowing what products exist in the recommender's database. By the intuition that "good recommendation requires knowing what's available," this should not work. It does, consistently, across domains.
The mechanism becomes clear once you compare to human search behavior. People rarely know the exact contents of a platform's inventory. They refine queries iteratively based on vague goals and system feedback — they search, see results, adjust the query based on what came back, search again. The catalog enters the loop indirectly through the system's response, not directly through advance knowledge.
Rec-R1 trained in closed-loop with the recommender learns this refinement process via reinforcement learning. The LLM's rewards depend on whether its generated query produces good ranking metrics from the recommender. Over training, the model learns implicit catalog awareness — which query forms produce good rankings on this specific recommender — without ever being shown the catalog explicitly.
The deployment consequence is significant for production systems with proprietary or constantly-changing catalogs. The LLM does not need access to the inventory database, does not need refresh cycles when the catalog changes, does not need synchronization protocols. As long as it can interact with the live recommender, it can stay aligned with evolving content trends. Rec-R1 is also compatible with real-time feedback — trained via online interactions with a live recommender where the LLM receives immediate performance signals (engagement rates, conversions).
The broader observation: closed-loop training can substitute for the access patterns we assume systems need. What looks like "the LLM needs to know the catalog" is often "the LLM needs to produce queries that work for this catalog" — and the second can be learned from feedback without the first.
Inquiring lines that use this note as a source 29
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Does universal approximation guarantee help with finite recommendation data?
- Which LLM recommender paradigm actually performs best empirically?
- Can semantic tokens bridge embeddings and direct recommendation?
- How do cost-efficient LLM models compare to high-performance ones in recommendation?
- How does collaborative filtering integrate into LLM-based recommendation systems?
- How does LLM-PKG compare to mining product relations directly from interaction data?
- How does pretraining corpus popularity bias affect LLM recommendation behavior?
- Which deployment domains favor LLM recommenders over traditional collaborative approaches?
- What happens when multiple recommendation objectives compete without explicit modeling?
- Why do real-world platforms need inductive learning for streaming recommendation systems?
- How do search API lookups enable LLM recommenders over proprietary or dynamic corpora?
- Can concept-based search bridge the vocabulary mismatch between conversation and item index?
- Why does pure numeric ID indexing force models to learn from scratch?
- How much context length can sequential recommenders handle before steering degrades?
- Does input augmentation outperform direct language-based recommendation systems?
- What efficiency costs does unified language modeling impose versus specialized recommenders?
- How do large pretrained language models scale the unified recommendation paradigm?
- What makes recommendation a small-data problem despite large scale?
- Do weight changes in recommender systems produce faster producer adaptation when content is automated?
- Do other recommendation domains suffer from similar shortcut learning in their benchmarks?
- Why do transductive recommenders fail where inductive learning succeeds?
- Can models retrieve the right tool without relying on vector similarity?
- Can cyclic aggregation between users and items enable fully inductive recommendation?
- How do recommender metrics drive LLM query refinement in closed-loop training?
- Why doesn't catalog synchronization matter for LLMs trained on live recommender feedback?
- What implicit knowledge about catalogs do LLMs learn from ranking signals alone?
- Can LLMs recommend items without seeing the product catalog?
- What role does vague intent play in realistic search evaluation?
- Can better prompting techniques overcome weak personalization in recommender systems?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can recommendation metrics train language models directly?
Explores whether LLMs can be optimized through closed-loop reinforcement learning using real recommendation system outputs as rewards, rather than relying on expensive proprietary model distillation.
same paper, the architectural enabler
-
How can LLM agents handle huge candidate lists without breaking?
ReAct agents fail when retrieval tools return hundreds of items that overflow prompts. What architectural changes let LLMs work effectively with large candidate sets in recommendation systems?
adjacent: a different architectural pattern for LLM-recommendation integration
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning
- Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
- CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation
- A Multi-facet Paradigm to Bridge Large Language Model and Recommendation
- Large Language Models are Zero-Shot Rankers for Recommender Systems
- Large Language Models as Zero-Shot Conversational Recommenders
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching
- Leveraging Large Language Models in Conversational Recommender Systems
Original note title
LLMs trained via closed-loop RL with recommendation feedback can recommend without seeing the catalog — they learn iterative query refinement from system metrics alone