How do discourse relation types improve dialogue beyond sentence-level semantic matching?
This explores what dialogue systems gain by modeling the *relationships between utterances* — causal, temporal, repair, hand-off — rather than just matching the meaning of one sentence to another.
This explores what dialogue systems gain by modeling the *relationships between utterances* — causal links, temporal order, repair moves, topic hand-offs — rather than just matching the meaning of one sentence against another. The corpus doesn't have a single paper that uses the phrase "discourse relation types," but several notes circle the same territory under different names, and together they make a sharp case: most of what holds a conversation together lives *between* sentences, not inside them.
Start with the raw building blocks. LLMs are noticeably better at causal relations than temporal ones, and the reason is telling — causal connectives ("because," "so," "therefore") are explicit and frequent in training text, while temporal order is usually left implicit and has to be inferred Why do LLMs handle causal reasoning better than temporal reasoning?. That's the discourse-relation lesson in miniature: when the relation between two utterances is marked on the surface, models handle it well; when it's only carried by the structure of the exchange, they stumble. Sentence-level semantic matching never sees that structure at all.
The deeper payoff is pragmatic rather than semantic. One line of work reframes dialogue understanding as *command generation* instead of intent classification — treating a turn by what it's trying to do in context, not what it literally says, which sidesteps annotation and handles context naturally Can command generation replace intent classification in dialogue systems?. A complementary note argues that the glue of conversation — reference repair, topic hand-off — is *social action*, not information encoding, and that models never learn it because training rewards predicting content, not doing relational work Why don't language models develop conversation maintenance skills?. Both point past semantics: the relation a turn bears to what came before is the thing that matters.
This is also where today's models visibly break. LLMs treat the opening prompt as a fixed frame and can't jointly update common ground — when a user pivots or contradicts an earlier framing, the model can't absorb the revision, so the human ends up as the sole scorekeeper Can LLMs truly update shared conversational common ground?. The proposed fixes are explicitly relational: collaborative rational speech acts add an information-theoretic layer for tracking *both* speakers' beliefs as understanding moves from partial to shared Can dialogue systems track both speakers' beliefs across turns?, and multi-turn-aware reward shaping trains models to ask clarifying questions and discover intent over a whole exchange rather than maximizing the next single reply Why do language models respond passively instead of asking clarifying questions?.
The thing you might not have expected: the corpus suggests discourse relations aren't a feature you bolt onto a semantic matcher — they're a *different training objective entirely*. Sentence matching optimizes for "what does this turn mean," while everything above optimizes for "what does this turn do to the shared state between us." That second question is what makes a conversation feel coherent, and it's exactly the one next-token prediction is built to ignore.
Sources 6 notes
ChatGPT excels at causal relations but struggles with temporal ordering because causal connectives are explicit and frequent in training data, while temporal order is often implicit and must be inferred contextually.
Rasa's dialogue understanding architecture generates domain-specific commands instead of classifying intents, eliminating annotation requirements, handling context naturally, and scaling without degradation—treating understanding as pragmatics rather than semantics.
Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.
LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.
CRSA integrates rate-distortion theory with RSA to enable bidirectional belief tracking across dialogue turns. Demonstrated on referential games and doctor-patient dialogues, it captures progression from partial to shared understanding, providing the information-theoretic framework that token-level LLM systems lack.
CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.