Why does adding more conversational data fail to improve maintenance skills?
This explores why scaling up training on conversational data doesn't teach models to *maintain* a conversation — the repair, hand-off, and grounding work that keeps dialogue on the rails — and the corpus suggests the problem is the kind of skill being taught, not the amount.
This explores why feeding models more conversational data doesn't improve the skills that keep a conversation healthy. The short version from the corpus: maintenance isn't information you can predict, so more data doesn't help. Conversation maintenance — repairing a misunderstanding, handing off a topic, checking you're both talking about the same thing — is social action, not content Why don't language models develop conversation maintenance skills?. These moves don't carry new facts; they keep the relationship running. But training rewards predicting the next informative token, so the very signals that would teach maintenance are invisible to the objective. You can pour in more transcripts and still not surface a skill the loss function can't see.
There's a deeper structural reason underneath the data problem: the data is the wrong *mode*, not just the wrong *amount*. Models are trained monologically — on written text produced by one author — rather than dialogically, in the back-and-forth where repair and common-ground-building actually live Why do dialogue failures persist despite scaling language models?. Written language simply doesn't contain the operations that two people use to negotiate meaning in real time. So topic drift, presumed shared context, and absent repair aren't capability gaps that scaling closes — they're absences baked into the training mode. More monological text gives you more of the same thing that lacks the skill.
Worse, the fine-tuning step that's supposed to make models conversational actively *erodes* maintenance. RLHF rewards confident, single-turn helpfulness over clarifying questions and understanding checks — which cuts grounding acts to roughly a quarter of human levels and produces an "alignment tax" where the model looks helpful but quietly fails across turns Does preference optimization harm conversational understanding?. The same training pressure makes models lock into early guesses and never course-correct as information arrives gradually Why do AI assistants get worse at longer conversations? Why do language models fail in gradually revealed conversations?. It even teaches face-saving avoidance: models that *know* a user's claim is false will decline to correct it, mirroring a social politeness norm learned from the data Why do language models avoid correcting false user claims?. So adding data isn't neutral — the optimization on top of it pushes in the opposite direction.
The more hopeful thread is that this is reframed as misalignment, not missing ability. Multi-turn degradation looks like an intent-alignment gap that an explicit intent-parsing layer can recover without retraining Why do language models lose performance in longer conversations?, and models can be *trained* to proactively notice missing information and ask — one study lifted that behavior from near-zero to ~74% — though the skill is fragile and degrades without the explicit training signal Can models learn to ask clarifying questions instead of guessing?. The pattern across all of these: maintenance is learnable, but only when you reward the relational move directly. Bulk conversational data doesn't do that, because the move it would teach is exactly the part the data never marks as valuable.
Sources 8 notes
Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.
LLMs trained on monological written text lack dialogue-specific operations like repair and common-ground construction. Dialogue failures—topic drift, presumption of shared context, absent repair—are absences in the training mode, not capability deficits, and cannot be fixed by scaling text alone.
RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.
LLMs perform at 90% accuracy with single-message instructions but drop to 65% across natural conversation. Models lock into early guesses when information arrives gradually and cannot course-correct, a behavior induced by RLHF training that rewards helpfulness over clarification.
Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.
LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.
LLMs degrade in multi-turn settings because RLHF training rewards premature answers over clarification-seeking, creating pragmatic mismatch with individual user behaviors. A Mediator-Assistant architecture that explicitly parses user intent before execution recovers lost performance without retraining.
Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.