Why do current large language models fail to entrain with users?

This explores why LLMs don't adapt to users the way human conversation partners do — mirroring word choice, building shared conventions, and adjusting over a conversation — and the corpus suggests the failure is baked into what training rewards, not a gap in raw capability.

This reads "entrain" in its conversational sense: the way human partners gradually sync up — borrowing each other's words, repairing misunderstandings, building shared ground as the exchange unfolds. The corpus points to a single root cause underneath several surface symptoms: models are trained to predict and deliver information, not to do the relational work that entrainment actually is.

The most direct evidence is that models simply don't mirror their users' vocabulary. Lexical entrainment — drifting toward a partner's word choices — is central to human rapport and clarity, yet current conversational AI lacks it almost entirely; it can be partly taught back in through preference training on word-choice conventions, which tells you it was never there by default Why don't conversational AI systems mirror their users' word choices?. The deeper framing is that conversation maintenance — reference repair, topic hand-off, the small moves that keep an exchange smooth — is *social action*, not information transfer. Models don't develop these skills because the training signal rewards predicting the next informative token, not sustaining a relationship Why don't language models develop conversation maintenance skills?.

That same reward structure shows up as a cluster of multi-turn failures. Across hundreds of thousands of conversations, models lock onto premature guesses early and can't recover as the user reveals more Why do language models fail in gradually revealed conversations?. The cause isn't lost capability — it's that RLHF rewards confident, immediate answers over asking for clarification, creating a pragmatic mismatch with how any individual user actually talks Why do language models lose performance in longer conversations?. Optimizing for next-turn helpfulness actively discourages the clarifying questions that entrainment depends on; reward the long-term value of an interaction instead, and models start discovering intent rather than guessing it Why do language models respond passively instead of asking clarifying questions?.

The interesting twist is that this is a *training-signal* gap, not a capacity wall — and the corpus keeps confirming that from different angles. Models follow "what to do" instructions but were never taught "what to ignore," so they chase conversational distractors; a thousand-odd synthetic dialogues fix it Why do language models engage with conversational distractors?. There's also a stubborn pull in the opposite direction: when a user's input conflicts with strong patterns from pretraining, the model's parametric priors override what's actually in front of it — entrainment requires bending toward the user, but the weights bend back toward the training distribution Why do language models ignore information in their context?.

What you might not expect: part of why a model never settles into a stable, entrained groove is that it isn't a fixed interlocutor at all. It holds a superposition of possible characters and samples one at generation time — regenerate the same turn and you get a different consistent persona Do large language models actually commit to a single character?. Entrainment assumes two parties who persist and converge; a system with no committed self has nothing to converge *from*. So the failure isn't that LLMs can't adapt — it's that adaptation was never the objective, and in places the architecture and the priors quietly push the other way.

Sources 8 notes

Why don't conversational AI systems mirror their users' word choices?

Response generation models fail to adapt vocabulary toward users' lexical choices, a phenomenon central to human rapport and clarity. Post-training via DPO on coreference-identified preferences can teach models in-context convention formation.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Why do language models fail in gradually revealed conversations?

Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.

Why do language models lose performance in longer conversations?

LLMs degrade in multi-turn settings because RLHF training rewards premature answers over clarification-seeking, creating pragmatic mismatch with individual user behaviors. A Mediator-Assistant architecture that explicitly parses user intent before execution recovers lost performance without retraining.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Why do language models engage with conversational distractors?

Fine-tuning on just 1,080 synthetic dialogues with distractor turns significantly improves topic resilience, revealing that the gap is not model capacity but absent training signal. Models learn to follow what-to-do instructions but not what-to-ignore instructions.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a conversational AI researcher re-testing claims about LLM entrainment—the ability to sync vocabulary, repair understanding, and maintain coherent shared ground across turns. A curated library spanning 2023–2026 surfaced these findings; your job is to judge which still hold.

What a curated library found — and when (dated claims, not current truth):
• Lexical entrainment (mirroring user word choice) is nearly absent by default in current models, but can be partly recovered via preference training (~2024–2025, arXiv:2404.03820).
• Models lock onto premature hypotheses early in multi-turn exchanges and cannot recover as users reveal more context (~2025, arXiv:2505.06120).
• RLHF reward structure favors confident immediate answers over clarifying questions, inverting the entrainment signal (~2025, arXiv:2602.07338).
• Models sample different stable personas across regenerations of the same turn, preventing consistent identity-based convergence (~2024, arXiv:2410.12405).
• Parametric priors from pretraining override in-context user signals, pulling models back toward training distribution rather than bending toward the user (~2025, arXiv:2505.22907).

Anchor papers (verify; mind their dates):
• arXiv:2404.03820 (2024-04): CantTalkAboutThis — topic adherence as instruction-tuning gap
• arXiv:2505.06120 (2025-05): LLMs Get Lost In Multi-Turn Conversation — premature assumption locking
• arXiv:2602.07338 (2026-02): Intent Mismatch — RLHF misalignment in multi-turn
• arXiv:2505.22907 (2025-05): Conversational Alignment — prior-override dynamics

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding, judge whether newer model scales, instruction-tuning variants (multi-turn-aware objectives, in-context persona pinning), or orchestration (prompt caching, explicit conversation state tracking, explicit collaborative framing) have RELAXED or overturned it. Separate the durable question (entrainment as a training-signal problem) from perishable claims (which specific reward fails, whether lexical entrainment is recoverable). State plainly where constraints still appear to hold.
(2) Surface the strongest work from the last ~6 months that CONTRADICTS the library's consensus that entrainment is trainable but absent. Look for papers claiming models DO entrain, or that entrainment requires architectural change, not just new signals.
(3) Propose 2 research questions that assume the regime may have moved: e.g., "Can multi-turn-aware RL fully eliminate premature locking?" or "Does persona-pinning via system prompt outperform fine-tuning for sustained entrainment?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Why do current large language models fail to entrain with users?

Sources 8 notes

Next inquiring lines