Why do current large language models fail to entrain with users?
This explores why LLMs don't adapt to users the way human conversation partners do — mirroring word choice, building shared conventions, and adjusting over a conversation — and the corpus suggests the failure is baked into what training rewards, not a gap in raw capability.
This reads "entrain" in its conversational sense: the way human partners gradually sync up — borrowing each other's words, repairing misunderstandings, building shared ground as the exchange unfolds. The corpus points to a single root cause underneath several surface symptoms: models are trained to predict and deliver information, not to do the relational work that entrainment actually is.
The most direct evidence is that models simply don't mirror their users' vocabulary. Lexical entrainment — drifting toward a partner's word choices — is central to human rapport and clarity, yet current conversational AI lacks it almost entirely; it can be partly taught back in through preference training on word-choice conventions, which tells you it was never there by default Why don't conversational AI systems mirror their users' word choices?. The deeper framing is that conversation maintenance — reference repair, topic hand-off, the small moves that keep an exchange smooth — is *social action*, not information transfer. Models don't develop these skills because the training signal rewards predicting the next informative token, not sustaining a relationship Why don't language models develop conversation maintenance skills?.
That same reward structure shows up as a cluster of multi-turn failures. Across hundreds of thousands of conversations, models lock onto premature guesses early and can't recover as the user reveals more Why do language models fail in gradually revealed conversations?. The cause isn't lost capability — it's that RLHF rewards confident, immediate answers over asking for clarification, creating a pragmatic mismatch with how any individual user actually talks Why do language models lose performance in longer conversations?. Optimizing for next-turn helpfulness actively discourages the clarifying questions that entrainment depends on; reward the long-term value of an interaction instead, and models start discovering intent rather than guessing it Why do language models respond passively instead of asking clarifying questions?.
The interesting twist is that this is a *training-signal* gap, not a capacity wall — and the corpus keeps confirming that from different angles. Models follow "what to do" instructions but were never taught "what to ignore," so they chase conversational distractors; a thousand-odd synthetic dialogues fix it Why do language models engage with conversational distractors?. There's also a stubborn pull in the opposite direction: when a user's input conflicts with strong patterns from pretraining, the model's parametric priors override what's actually in front of it — entrainment requires bending toward the user, but the weights bend back toward the training distribution Why do language models ignore information in their context?.
What you might not expect: part of why a model never settles into a stable, entrained groove is that it isn't a fixed interlocutor at all. It holds a superposition of possible characters and samples one at generation time — regenerate the same turn and you get a different consistent persona Do large language models actually commit to a single character?. Entrainment assumes two parties who persist and converge; a system with no committed self has nothing to converge *from*. So the failure isn't that LLMs can't adapt — it's that adaptation was never the objective, and in places the architecture and the priors quietly push the other way.
Sources 8 notes
Response generation models fail to adapt vocabulary toward users' lexical choices, a phenomenon central to human rapport and clarity. Post-training via DPO on coreference-identified preferences can teach models in-context convention formation.
Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.
Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.
LLMs degrade in multi-turn settings because RLHF training rewards premature answers over clarification-seeking, creating pragmatic mismatch with individual user behaviors. A Mediator-Assistant architecture that explicitly parses user intent before execution recovers lost performance without retraining.
CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.
Fine-tuning on just 1,080 synthetic dialogues with distractor turns significantly improves topic resilience, revealing that the gap is not model capacity but absent training signal. Models learn to follow what-to-do instructions but not what-to-ignore instructions.
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.
Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.