SYNTHESIS NOTE
Language, Text, and Discourse Reasoning, Retrieval, and Evaluation

Why does ChatGPT fail at implicit discourse relations?

ChatGPT excels when discourse connectives are present but drops to 24% accuracy without them. What does this gap reveal about how LLMs actually process meaning and logical relationships?

Synthesis note · 2026-02-21 · sourced from Discourses
Where exactly do LLMs break down with language structure? How should researchers navigate LLM reasoning research?

The discourse relations paper (ChatGPT on temporal, causal, and discourse relations) found a dramatic asymmetry in ChatGPT's discourse understanding:

This is not a small gap. 24.54% accuracy on implicit discourse relations is barely above chance for an 11-class task. ChatGPT "cannot understand the abstract sense of each discourse relation and the features from the text" when the surface connectives are absent.

The explanation is transparent: LLMs have access to massive training data where connectives are pervasive and reliable signals. When you see "therefore" or "because," the discourse relation is explicit in the surface form. Learning to respond to these signals is straightforward statistical learning. Inferring the same relations without surface signals requires understanding what the two clauses actually mean and what logical relationship holds between them.

This asymmetry shows that what LLMs have learned for discourse relation detection is largely cue-based — they respond to surface signals, not to structural meaning. When the surface cue is removed, the competence collapses.

This connects directly to What three layers must discourse systems actually track?: implicit discourse relation detection requires exactly the intentional structure that the linguistic structure alone doesn't carry.

A concrete instance beyond discourse relations: The same explicit/implicit asymmetry surfaces in metaphor extraction. LLMs can identify explicit source-target domain mappings (where the analogy's terms are stated) but fail on the implicit elements human readers routinely infer — e.g., the unstated target concept that completes a proportional analogy where only three of four terms are given. The failure is not specific to discourse-connective tasks; it is the general pattern wherever meaning depends on what is not said.

The literary analysis implication: Poetry and literary prose operate primarily through implicit relations. The connections between images in a poem, the causal logic of a narrative, the thematic resonance between scenes — these are rarely marked by explicit connectives. A poet does not write "the rose symbolizes mortality because..." The reader must infer the relation. This means the 24% implicit accuracy rate is not a peripheral limitation for literary analysis — it is a central one. Since Can LLMs truly understand literary meaning or just mechanics?, the discourse competence asymmetry is one of four converging mechanisms that explain why LLMs can parse literary texts mechanically but cannot interpret them meaningfully.

Inquiring lines that use this note as a source 8

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
21 direct connections · 142 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

llm discourse competence is asymmetric: explicit connectives enable performance but implicit relations cause systematic failure