SYNTHESIS NOTE
Language, Text, and Discourse

What three layers must discourse systems actually track?

Grosz and Sidner's 1986 framework proposes that discourse requires simultaneously tracking linguistic segments, speaker purposes, and salient objects. Understanding why all three are necessary helps explain where current AI systems structurally fail.

Synthesis note · 2026-02-21 · sourced from Discourses
Where exactly do LLMs break down with language structure? How should researchers navigate LLM reasoning research?

Grosz and Sidner (1986) identified three separate but interrelated structural layers in any discourse:

  1. Linguistic structure — how utterances naturally aggregate into segments
  2. Intentional structure — the purposes expressed in each segment and the relationships among those purposes
  3. Attentional state — a dynamic record of which objects, properties, and relations are currently salient

The key claim is that these three are necessary and jointly sufficient. You cannot explain cue phrases, referring expressions, or interruptions using only one or two of the components. Each handles distinct phenomena; their coordination handles the rest.

The practical implication for AI discourse: systems that track surface segments without tracking purposes — or that track purposes without tracking what is currently salient — will systematically fail on exactly the phenomena that require the missing layer. Coreference resolution across long contexts, for instance, requires attentional state, not just linguistic bracketing.

This is foundational theory for understanding where LLMs structurally underperform. They may approximate the linguistic layer (segment detection) and partially the intentional layer (intent inference), but the attentional state — what is currently foregrounded vs. backgrounded — is the hardest to represent in static context windows.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
17 direct connections · 116 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

discourse structure has three irreducible components: linguistic, intentional, and attentional