How do coreference chains preserve coherence across dialogue turns?
This reads the question as: how does keeping track of who and what 'it', 'she', or 'that' refers to across turns hold a conversation together — and what the corpus says actually does (and doesn't) keep dialogue coherent.
This explores coreference — the chains of pronouns and references that point back to earlier mentions — as a mechanism for staying coherent across dialogue turns. The honest answer the corpus gives is a useful surprise: coreference shows up less as a thing that *preserves* coherence and more as a place where coherence visibly *breaks*, and the deeper work of holding a conversation together turns out to live somewhere else entirely.
The most direct doorway is research on how dialogue coherence fails. Using Abstract Meaning Representation, one line of work identifies four distinct ways dialogue comes apart — contradiction, coreference inconsistency, irrelevancy, and dropping engagement — and shows that classifiers trained on meaning structure catch these failures while surface-text checks miss them What semantic failures break dialogue coherence most realistically?. So coreference inconsistency is a named failure mode: when the chain breaks, the conversation reads as incoherent. Coreference also turns out to be a *signal* you can train on — work on lexical entrainment uses coreference-identified preferences to teach models to converge on shared vocabulary the way human partners do, suggesting the same machinery that tracks referents can drive the convention-forming that makes dialogue feel mutual Why don't conversational AI systems mirror their users' word choices?.
But here's the lateral turn. The corpus suggests coherence across turns isn't really maintained by resolving pronouns — it's maintained by *grounding*: the running, jointly-built agreement about what's been established. And that's exactly where current LLMs are weak. One striking finding is that LLMs treat the opening prompt as a fixed frame and can't symmetrically update common ground — when a user pivots or contradicts an earlier framing, the model can't absorb the revision, leaving the human as the sole keeper of the conversational scoreboard Can LLMs truly update shared conversational common ground?. Worse, the training itself erodes the repair behaviors that fix broken reference: preference optimization rewards confident single-turn answers over clarifying questions, cutting grounding acts to a fraction of human levels Does preference optimization harm conversational understanding?. And these maintenance moves — reference repair, topic hand-off — are social actions, not information encoding, which is why prediction-trained models never develop them Why don't language models develop conversation maintenance skills?.
There's a still-deeper instability underneath coreference. For a reference chain to hold, the *thing being referred to* has to stay stable — but LLMs don't commit to a single character or object. Shanahan's 20-questions regeneration test shows the model holds a superposition of consistent referents and samples one at generation time, so re-running the same turn yields different-but-locally-coherent answers Do large language models actually commit to a single character?. If the referent itself was never fixed, a coreference chain has nothing solid to anchor to. This connects to why models get lost over many turns: they lock into premature assumptions early and can't recover, producing a 39% average performance drop in multi-turn settings Why do language models fail in gradually revealed conversations?.
If you want the constructive alternative the corpus points to, it's the information-theoretic framing that token-level systems lack: Collaborative Rational Speech Acts extends pragmatic reasoning to track *both* speakers' beliefs across turns, modeling the progression from partial to shared understanding Can dialogue systems track both speakers' beliefs across turns?. The takeaway you didn't know you wanted: coreference chains don't preserve coherence on their own — they're a readout of whether the harder thing, jointly maintained common ground, is intact. Fix the grounding and the chains hold; resolve pronouns without grounding and you get fluent text that quietly loses the thread.
Sources 8 notes
Research using Abstract Meaning Representation identified four distinct incoherence types: contradiction, coreference inconsistency, irrelevancy, and decreased engagement. AMR-trained classifiers detect these semantic failures while text-level manipulations alone cannot.
Response generation models fail to adapt vocabulary toward users' lexical choices, a phenomenon central to human rapport and clarity. Post-training via DPO on coreference-identified preferences can teach models in-context convention formation.
LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.
RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.
Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.
Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.
Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.
CRSA integrates rate-distortion theory with RSA to enable bidirectional belief tracking across dialogue turns. Demonstrated on referential games and doctor-patient dialogues, it captures progression from partial to shared understanding, providing the information-theoretic framework that token-level LLM systems lack.