Why do dialogue failures persist despite scaling language models?

If LLMs get better at text tasks with more training data, why don't dialogue-specific problems improve the same way? The question explores whether dialogue failures are capability gaps or structural training mismatches.

Synthesis note · 2026-04-14

The vast majority of LLM training data is written monological text: articles, essays, books, web pages, documentation, code, social media posts. Even text that records dialogue (interview transcripts, fiction with conversation, forum threads) appears in the corpus as written text — third-person record of dialogue, not first-person dialogical engagement. The model trains by predicting the next token in this corpus, which means the operation it learns is text-continuation: given a span of writing, what comes next.

This training mode is monological. The model is never in dialogue during training. It never has to coordinate with another agent, never has to repair misunderstanding, never has to track another speaker's perspective updating in real time. The dialogical operation — two agents addressing each other, building shared understanding through reciprocal moves — is not a training signal. The model can only encounter this operation as text-about-it, not as text-of-it.

The failure modes of LLM dialogue track this exactly. Topic drift in multi-turn conversation: the model lacks a persistent intentional structure for the dialogue because dialogues weren't training units. Presumption of common ground rather than its construction: the model has no training signal for the construction-of-common-ground operation, so it produces output as if the ground is already shared. Absence of conversational repair: the model has no training signal for the repair operation, so it does not perform repair when context indicates it is needed. Each failure is the absence of an operation the training mode never required the model to perform.

The diagnostic significance: many of LLM dialogue's failures are not capability deficits in the model — they are absences in the training mode. No amount of additional written-text training will produce the operations, because the operations are not in the training data and cannot be inferred from it. The training mode determines what failure modes will appear; structural changes to the training mode (training models in actual dialogue with other agents) would be required to address them.

This is why the standard "fix LLM dialogue with more text" approach has produced limited progress on dialogue-specific failures despite continued progress on text-continuation tasks. The problems that scale solves are problems within the training mode; the problems within dialogue specifically are not within the training mode. They require a different kind of training, or different post-training intervention — neither of which is purely a scale problem. Does human language use ever exist outside communication? is the human-acquisition companion claim that explains why the asymmetry matters.

The strongest counterargument: dialogue-specific fine-tuning and RLHF on conversational examples partially close the gap. Yes, partially — and the partiality is informative. The fine-tuning can produce more dialogue-like surface output without producing the underlying operations, because the post-training signal is still text, just dialogue-shaped text. The mode is unchanged.

Inquiring lines that use this note as a source 7

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 108 in 2-hop network ·medium cluster Open in graph ↗

Why do dialogue failures persist despite scaling… Does human language use ever exist outside communi… Are language models and human speakers doing the s… Why don't conversational AI systems mirror their u…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does human language use ever exist outside communication? Explores whether humans can use language in non-communicative ways, or whether the communicative scaffold learned in childhood persists through all language use including private writing and internal thought.
the human-acquisition companion claim
Are language models and human speakers doing the same thing? Does treating LLM output and human communication as equivalent operations mask fundamental differences in how they work? This distinction shapes how we assess AI capabilities and risks.
the meta-discourse claim that follows from this training-mode asymmetry
Why don't conversational AI systems mirror their users' word choices? Explores whether current dialogue models exhibit lexical entrainment—the human tendency to align vocabulary with conversation partners—and what's needed to bridge this gap in AI communication.
one of the specific dialogue-failure-modes this training-mode claim explains

Why do dialogue failures persist despite scaling language models?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4