What three layers must discourse systems actually track?
Grosz and Sidner's 1986 framework proposes that discourse requires simultaneously tracking linguistic segments, speaker purposes, and salient objects. Understanding why all three are necessary helps explain where current AI systems structurally fail.
Grosz and Sidner (1986) identified three separate but interrelated structural layers in any discourse:
- Linguistic structure — how utterances naturally aggregate into segments
- Intentional structure — the purposes expressed in each segment and the relationships among those purposes
- Attentional state — a dynamic record of which objects, properties, and relations are currently salient
The key claim is that these three are necessary and jointly sufficient. You cannot explain cue phrases, referring expressions, or interruptions using only one or two of the components. Each handles distinct phenomena; their coordination handles the rest.
The practical implication for AI discourse: systems that track surface segments without tracking purposes — or that track purposes without tracking what is currently salient — will systematically fail on exactly the phenomena that require the missing layer. Coreference resolution across long contexts, for instance, requires attentional state, not just linguistic bracketing.
This is foundational theory for understanding where LLMs structurally underperform. They may approximate the linguistic layer (segment detection) and partially the intentional layer (intent inference), but the attentional state — what is currently foregrounded vs. backgrounded — is the hardest to represent in static context windows.
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
How do readers track segments, purposes, and salience together?
Can discourse processing actually happen in parallel rather than sequentially? This matters because understanding how readers coordinate multiple layers of meaning at once reveals where AI systems break down in comprehension.
the processing consequence of this structural claim
-
Why does ChatGPT fail at implicit discourse relations?
ChatGPT excels when discourse connectives are present but drops to 24% accuracy without them. What does this gap reveal about how LLMs actually process meaning and logical relationships?
explicit connectives are a surface proxy for the intentional and attentional layers LLMs miss
-
Can tracking dialogue dimensions simultaneously reveal hidden conversation patterns?
Does encoding linguistic complexity, emotion, topics, and relevance as parallel temporal streams expose emergent patterns that traditional statistical analysis misses? This matters because conversation success may depend on interactions between dimensions, not individual features alone.
Conversational DNA operationalizes the three-component theory: linguistic complexity maps to the linguistic structure, emotional trajectories capture intentional state, and topic coherence with conversational relevance track attentional state — making the theoretical claim computationally concrete
-
What semantic failures break dialogue coherence most realistically?
Can we distinguish distinct types of incoherence by manipulating semantic structure rather than surface text? This matters because text-level evaluations miss the semantic failures that actually occur in dialogue systems.
DEAM's four failure modes map onto the three components: contradiction/coreference involve attentional state, irrelevancy involves intentional structure, decreased engagement spans all three
-
What makes explanations work in real conversation?
Does explanation quality depend on how dialogue partners interact—testing understanding, adjusting based on feedback, and coordinating their communicative moves—rather than just information content alone?
parallel three-component structure: explanation has topic relation (maps to linguistic structure), dialogue act (maps to intentional structure), and explanation move (maps to attentional state); the structural parallel suggests a deeper pattern where discourse activities require tracking three irreducible dimensions
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Attention, Intentions, And The Structure Of Discourse
- Dialogue Transformers
- Conversational Alignment with Artificial Intelligence in Context
- Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models’ Understanding of Discourse Relations
- Conversational Semantic Parsing for Dialog State Tracking
- Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
- Sequence Organization in Interaction: A Primer in Conversation Analysis
- “Mama Always Had a Way of Explaining Things So I Could Understand”: A Dialogue Corpus for Learning to Construct Explanations
Original note title
discourse structure has three irreducible components: linguistic, intentional, and attentional