Do simulated training interactions transfer to real conversations?

Most conversational recommender systems train on simulated entity-level exchanges, not natural dialogue. The question is whether models built this way actually work when deployed with real users who speak naturally and deviate from expected patterns.

Synthesis note · 2026-05-03 · sourced from Recommenders Conversational

The conversational recommender literature splits into two practically incompatible strands. Standard CRS research uses simulated interactions — turns exchange entity-level information (item names, attribute values), not natural-language utterances. The user model is a programmatic simulator that emits "I like attribute X" rather than "you know, I'm in the mood for something fun but not silly." This simulation makes training tractable and benchmarks reproducible, but it short-circuits the actual problems of language understanding, response generation, topic planning, and knowledge engagement.

Holistic CRS, in contrast, trains on conversational data collected from real-world scenarios. The system must handle imperfect intent understanding, unexpected dialogue turns, and the social dynamics of recommendation conversation (encouragement, hedging, explanation). Holistic CRS approaches structurally combine three components: a backbone language model, optional external knowledge, and optional external guidance.

The dichotomy matters because conclusions from standard CRS evaluation do not transfer to deployed systems. Models that win on simulated benchmarks may collapse on real conversation, where users say "Whatever, I'm open to any suggestion" because they don't have a specific preference yet, or where the conversation goes off-topic and back. Real human conversation includes content that simulators don't generate — and the systems trained on simulated data have no exposure to this distribution.

The practical consequence is that CRS research has accumulated a decade of methodology against a problem (entity-level dialogue) that no production system actually faces. Holistic CRS is under-explored because data is harder to collect, but it is the only setting that maps to real applications.

Inquiring lines that use this note as a source 4

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 147 in 2-hop network ·dense cluster Open in graph ↗

Do simulated training interactions transfer to r… Why do LLM user simulators fail to track their own… Can controlled latent variables make LLM user simu… Do conversational recommender benchmarks actually … Do recommendation strategies beyond preference que… Can language models simulate belief change in peop…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why do LLM user simulators fail to track their own goals? LLM-based user simulators drift away from assigned goals during multi-turn conversations, producing unreliable reward signals for agent training. Understanding this goal misalignment problem is critical because it undermines the entire RL training pipeline.
extends: the simulator-reality gap UGST documents at goal-tracking level is the same failure CRS holism names at the dialogue level
Can controlled latent variables make LLM user simulators realistic? Can session-level and turn-level latent variables steer LLM-based user simulators toward realistic dialogue while maintaining measurable diversity and ground truth labels for training conversational systems?
tension with: holistic CRS argues simulators don't transfer; latent-variable simulators argue controllability grounds realism — the question is what realism counts
Do conversational recommender benchmarks actually measure recommendation skill? Conversational recommender systems are evaluated against ground-truth items mentioned later in conversations. But does this metric distinguish between genuinely recommending new items versus simply repeating items users already discussed?
extends: another way the entity-level CRS evaluation paradigm produces false progress signals
Do recommendation strategies beyond preference questions work better? What role do sociable conversational moves—opinion sharing, encouragement, credibility signals—play in successful human recommendations, compared to simply asking what someone likes?
grounds: the empirical evidence that real human CRS dialogues use sociable strategies entity-level simulators cannot generate
Can language models simulate belief change in people? Current LLM social simulators treat behavior as input-output mappings without modeling internal belief formation or revision. Can they be redesigned to actually track how people think and change their minds?
complements: same critique generalized — entity-exchange simulation is the CRS-specific instance of the broader behaviorist-simulation failure

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

CRS models trained on simulated entity-level interactions do not generalize to real human conversation — the holistic CRS gap

Do simulated training interactions transfer to real conversations?

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4