Why do language models skip the calibration step?
Current LLMs assume shared understanding rather than building it through dialogue. This explores why that design choice persists and what breaks when it fails.
Two fundamentally different communicative modes (from Chandu et al. 2021, formalized in Grounding 'Grounding' in NLP):
Static grounding: Common ground is presumed or pseudo-automatically established. The sequence is: human queries → agent retrieves from data → agent responds. The "common ground" here is the database itself, treated as universal truth. No negotiation occurs. The agent succeeds by linking the query to the right data.
Dynamic grounding: Common ground is built through interaction. The sequence involves clarification requests, acknowledgments, confirmations, and corrections — looping until mutual understanding is established. Only then does response delivery proceed. The interaction IS the grounding process.
Static grounding is the dominant mode in current LLM deployment. A user asks a question; the model generates an answer presuming shared understanding. This works when the query is unambiguous and the data retrieval is correct. It fails — silently — when the user's intent diverges from the model's interpretation, because there is no mechanism to detect or repair the divergence.
Dynamic grounding is what human dialogue depends on. Effective training domains (emotional support, conflict resolution, teaching) require dynamic grounding: the agent must detect when understanding has broken down and initiate repair. Since Does preference optimization damage conversational grounding in large language models?, the training that makes LLMs better at static grounding appears to make them worse at dynamic grounding.
Static grounding is the technical version of false punditry. The static/dynamic distinction names at the technical level what false punditry names at the social-media level: skipping the calibration step in which speakers verify shared understanding before treating claims as commonly accepted. Static grounding presumes the shared ground and proceeds; false punditry presumes the shared ground and proceeds. One is a design pattern in dialogue systems, the other is a genre pattern in AI-generated commentary, but the structural move is the same — the omission of the calibration step that would expose whether the ground is actually shared. Seen this way, false punditry is not a stylistic quirk of AI posts; it is static grounding transposed into public-facing genres where the absent calibration is not merely a conversational limitation but a legitimacy problem.
The distinction maps onto a structural asymmetry: static grounding is a retrieval problem; dynamic grounding is an intersubjectivity problem. LLMs are trained on the former.
Inquiring lines that use this note as a source 21
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why does removing language from its context destroy what makes it work?
- What alignment artifacts suppress critical knowledge in LLM-generated explanations?
- Why do LLM explanations feel authoritative even when alignment with the model fails?
- What components must wrap an LLM to build a working CRS?
- What causes LLMs to ignore unstated constraints they know about?
- Why do LLMs presume common ground instead of building it carefully?
- What interaction design changes would help LLMs handle underspecified requests?
- Why do LLMs presume common ground instead of building it?
- Do LLMs build common ground or assume it already exists?
- Can LLMs build shared understanding through dynamic grounding rather than presuming it?
- How does preference optimization weaken conversational grounding in LLMs?
- What training data barriers prevent LLMs from learning real Socratic dialogue?
- How does preference optimization reduce LLM grounding and clarification behavior?
- Why do models skip steps that would make reasoning clearer?
- Why do LLMs choose incorrect edits despite understanding the task?
- Can we use LLM language without adopting LLM assumptions?
- Why do LLMs lack the communicative scaffold that humans learn?
- Does the alignment frame mislead us about what LLM problems actually are?
- Can LLMs reliably audit other language models for errors?
- What unique perspective do designers bring to LLM adaptation that engineers might miss?
- How does linguistic calibration differ from token probability calibration?
Related concepts in this collection 6
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do speakers need to actively calibrate shared reference?
Explores whether using the same words guarantees speakers mean the same thing. Investigates how referential grounding differs across people and what collaborative work is needed to establish true understanding.
why dynamic grounding is needed: referential grounding differs across speakers
-
Do language models actually build shared understanding in conversation?
When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
the conversational consequence: LLMs default to static mode
-
Does preference optimization damage conversational grounding in large language models?
Exploring whether RLHF and preference optimization actively reduce the communicative acts—clarifications, acknowledgments, confirmations—that build shared understanding in dialogue. This matters for high-stakes applications like medical and emotional support.
RLHF makes the static/dynamic gap worse
-
How do readers track segments, purposes, and salience together?
Can discourse processing actually happen in parallel rather than sequentially? This matters because understanding how readers coordinate multiple layers of meaning at once reveals where AI systems break down in comprehension.
what dynamic grounding requires structurally
-
When should AI agents ask users instead of just searching?
Explores whether tool-enabled LLMs should probe users for clarification when uncertain, rather than silently chaining tool calls that drift from intent. Examines conversation analysis patterns as a formal alternative.
insert-expansions are the pre-emptive mechanism of dynamic grounding: probing the user to clarify intent before committing to a response is exactly the clarification loop that distinguishes dynamic from static grounding
-
Can AI systems detect and correct misunderstandings after responding?
How do conversational systems recognize when their previous response was based on a misunderstanding, and what mechanism allows them to correct it retroactively rather than restart?
TPR is the reactive mechanism of dynamic grounding: correcting misunderstanding after it has been acted on; together with insert-expansions, it covers the full repair lifecycle that dynamic grounding requires
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
- Grounding Gaps in Language Model Generations
- Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation
- Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency
- Task-Oriented Dialogue with In-Context Learning
- Cognitive Architectures for Language Agents
- Conversational Alignment with Artificial Intelligence in Context
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
Original note title
static grounding presumes common ground while dynamic grounding builds it through clarification and repair