How do readers track segments, purposes, and salience together?
Can discourse processing actually happen in parallel rather than sequentially? This matters because understanding how readers coordinate multiple layers of meaning at once reveals where AI systems break down in comprehension.
Discourse processing, according to Grosz & Sidner, requires three recognition tasks happening in parallel:
- How utterances aggregate into linguistic segments
- What intentions are expressed in each segment and how those intentions relate to each other
- What is currently salient (objects, properties, relations) as the discourse unfolds
The important point is that these tasks are not sequential. You cannot recognize segments first, then extract intentions, then update salience — they constrain each other during processing. An intention shift often marks a segment boundary; a reference resolves only against the current attentional state.
This creates a structural challenge for architectures that process language linearly. Even if each component is handled well in isolation, their coordination across a long context is what breaks down. When LLMs fail at tasks like understanding interrupted dialogues or resolving pronouns across far-apart segments, the failure is specifically in the joint tracking of all three layers.
The cleaner framing for AI evaluation: testing discourse understanding should test all three layers together, not in isolation. A model that passes coreference tests (attentional) may still fail at detecting intentional structure shifts, and vice versa.
Failure mode taxonomy via DEAM: The DEAM framework operationalizes discourse coherence failure through AMR (Abstract Meaning Representation) manipulation, identifying four distinct semantic-level failure modes: contradiction (conflicting propositions), coreference inconsistency (entity reference failures), irrelevancy (off-topic contributions), and decreased engagement (disengagement patterns). Since What semantic failures break dialogue coherence most realistically?, each failure mode maps to a specific breakdown in the Grosz & Sidner layers — contradiction and coreference affect the attentional state, irrelevancy disrupts intentional structure, and decreased engagement signals segment-level disengagement.
Operationalization via Conversational DNA: The Conversational DNA project provides a concrete visualization method for tracking this multi-dimensional coherence. Linguistic complexity (sentence length, syntactic depth, vocabulary diversity), emotional valence (VADER + RoBERTa), topic coherence (LDA with sliding window), and conversational relevance (semantic similarity + discourse markers + pronoun resolution) are processed as simultaneous parallel streams. This moves the Grosz & Sidner framework from theoretical claim to operational tool — emergent patterns in the interaction between these temporal streams reveal conversational dynamics invisible to traditional statistical analysis (Can tracking dialogue dimensions simultaneously reveal hidden conversation patterns?).
Inquiring lines that use this note as a source 25
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What makes human discourse fundamentally temporal in structure?
- Can we develop competent reading practices for disembodied orality?
- How do readers selectively hold frame-related words in mind?
- How do humans maintain separate mental contexts during a single conversation?
- How does the location of causal passages differ between news and lectures?
- What signals beyond surface content indicate a passage caused a user's reaction?
- What makes intentional structure shifts different from segment boundaries?
- How do dialogue coherence failures map onto the three discourse components?
- How do the four discourse relations differ in their connection to anxiety?
- Can discourse communities collectively detect disruptions individual readers miss?
- How do humans detect which words belong to the same frame together?
- What role does joint attention play in how humans learn language meaning?
- Why do discourse failures cluster in attention and intentional layers rather than linguistics?
- How does temporal event structure scaffold coherence in dialogue?
- Can adding more words to a passage actually interfere with meaning?
- Why do different readers extract different meanings from identical text?
- Can multimodal telemetry operationalize the attentional component of discourse?
- What distinguishes local coherence from global coherence in dialogue?
- Why do posters acknowledge multiple viewpoints without integrating them into coherent judgments?
- What role does accommodation play in making discourse coherent?
- How does the Question Under Discussion shape what content projects?
- How do readers project author identity from textual cues during interpretation?
- Why does joint attention matter for acquiring linguistic meaning?
- Can readers detect meaning through resonance patterns alone without knowing authorial intent?
- Where does the meaning actually originate in reader-detected resonance across language?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
What three layers must discourse systems actually track?
Grosz and Sidner's 1986 framework proposes that discourse requires simultaneously tracking linguistic segments, speaker purposes, and salient objects. Understanding why all three are necessary helps explain where current AI systems structurally fail.
the structural claim this is the processing consequence of
-
Why does ChatGPT fail at implicit discourse relations?
ChatGPT excels when discourse connectives are present but drops to 24% accuracy without them. What does this gap reveal about how LLMs actually process meaning and logical relationships?
implicit relations require all three layers; explicit connectives only require the linguistic
-
Can tracking dialogue dimensions simultaneously reveal hidden conversation patterns?
Does encoding linguistic complexity, emotion, topics, and relevance as parallel temporal streams expose emergent patterns that traditional statistical analysis misses? This matters because conversation success may depend on interactions between dimensions, not individual features alone.
operationalizes multi-dimensional tracking as concrete visualization
-
What semantic failures break dialogue coherence most realistically?
Can we distinguish distinct types of incoherence by manipulating semantic structure rather than surface text? This matters because text-level evaluations miss the semantic failures that actually occur in dialogue systems.
DEAM maps failure modes to the three discourse layers
-
Can structured prompting improve cognitive distortion detection?
This explores whether breaking distortion diagnosis into discrete stages—mirroring clinical CBT workflow—helps language models identify and classify thinking patterns more accurately than standard approaches.
DoT's three-stage clinical prompting mirrors the three-component model: subjectivity assessment maps to linguistic structure (what was said), contrastive reasoning maps to intentional structure (what was meant), schema analysis maps to attentional structure (what cognitive framework is salient)
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Attention, Intentions, And The Structure Of Discourse
- Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models’ Understanding of Discourse Relations
- Thought Anchors: Which LLM Reasoning Steps Matter?
- Conversational Semantic Parsing for Dialog State Tracking
- What does it mean to understand language?
- Implicit Chain of Thought Reasoning via Knowledge Distillation
- Dialogue Transformers
- From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation
Original note title
discourse coherence requires simultaneously tracking segments, purposes, and salient objects