SYNTHESIS NOTE
Conversational AI and Personalization Language, Text, and Discourse Psychology, Society, and Alignment

Why do dialogue failures persist despite scaling language models?

If LLMs get better at text tasks with more training data, why don't dialogue-specific problems improve the same way? The question explores whether dialogue failures are capability gaps or structural training mismatches.

Synthesis note · 2026-04-14
What kind of thing is an LLM really?

The vast majority of LLM training data is written monological text: articles, essays, books, web pages, documentation, code, social media posts. Even text that records dialogue (interview transcripts, fiction with conversation, forum threads) appears in the corpus as written text — third-person record of dialogue, not first-person dialogical engagement. The model trains by predicting the next token in this corpus, which means the operation it learns is text-continuation: given a span of writing, what comes next.

This training mode is monological. The model is never in dialogue during training. It never has to coordinate with another agent, never has to repair misunderstanding, never has to track another speaker's perspective updating in real time. The dialogical operation — two agents addressing each other, building shared understanding through reciprocal moves — is not a training signal. The model can only encounter this operation as text-about-it, not as text-of-it.

The failure modes of LLM dialogue track this exactly. Topic drift in multi-turn conversation: the model lacks a persistent intentional structure for the dialogue because dialogues weren't training units. Presumption of common ground rather than its construction: the model has no training signal for the construction-of-common-ground operation, so it produces output as if the ground is already shared. Absence of conversational repair: the model has no training signal for the repair operation, so it does not perform repair when context indicates it is needed. Each failure is the absence of an operation the training mode never required the model to perform.

The diagnostic significance: many of LLM dialogue's failures are not capability deficits in the model — they are absences in the training mode. No amount of additional written-text training will produce the operations, because the operations are not in the training data and cannot be inferred from it. The training mode determines what failure modes will appear; structural changes to the training mode (training models in actual dialogue with other agents) would be required to address them.

This is why the standard "fix LLM dialogue with more text" approach has produced limited progress on dialogue-specific failures despite continued progress on text-continuation tasks. The problems that scale solves are problems within the training mode; the problems within dialogue specifically are not within the training mode. They require a different kind of training, or different post-training intervention — neither of which is purely a scale problem. Does human language use ever exist outside communication? is the human-acquisition companion claim that explains why the asymmetry matters.

The strongest counterargument: dialogue-specific fine-tuning and RLHF on conversational examples partially close the gap. Yes, partially — and the partiality is informative. The fine-tuning can produce more dialogue-like surface output without producing the underlying operations, because the post-training signal is still text, just dialogue-shaped text. The mode is unchanged.

Inquiring lines that use this note as a source 7

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 108 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

LLMs are trained monologically on written language not dialogically in conversation — training mode determines failure mode