How does the Question Under Discussion shape what counts as presupposed?

This explores how the Question Under Discussion (QUD) — the implicit question a conversation is currently trying to answer — decides which parts of a sentence get treated as taken-for-granted background rather than as the actual claim being made.

This explores how the Question Under Discussion — the implicit question a conversation is trying to answer at any moment — controls what slips into the background as presupposed rather than what's actively asserted. The corpus has a clean answer to this: what counts as presupposed isn't fixed by the words themselves but by what the conversation is currently asking. The sharpest evidence comes from work showing that projection is gradient, not binary — across 19 English expressions, the same trigger word projects (i.e., survives as background assumption) more or less depending on whether its content addresses the live question, not on any fixed property of the word Does projection strength vary by context or by word type?. Content that isn't 'at-issue' for the current QUD is exactly the content that gets quietly presupposed. So the QUD is the switch: it sorts each piece of a sentence into 'this is the point' versus 'this is assumed.'

That reframes presupposition from a dictionary fact into a conversational move. One note makes this dual nature explicit: presuppositions have two origins — some are baked into trigger words lexically, but others arise through accommodation, where listeners silently update the shared context to make a mismatched utterance make sense Do language models miss presuppositions that arise from context?. Accommodation only works because there's a QUD to resolve against; you absorb the assumption because rejecting it would derail the question on the table. This is also why it's a quietly powerful persuasion tool — presenting a new claim as presupposed background lets it bypass the scrutiny an open assertion would invite, since the QUD isn't pointed at it Why are presuppositions more persuasive than direct assertions?.

The most revealing material, though, comes from watching machines fail at this. Language models treat presupposition triggers as surface cues rather than computing what they mean against the discourse — embedding verbs and triggers become 'blinds' that systematically corrupt their entailment predictions Why do embedding contexts confuse LLM entailment predictions?. More tellingly, models accommodate false presuppositions even when they demonstrably know the facts are wrong: GPT-4 rejects them only 84% of the time, some models barely at all, and performance roughly halves on questions carrying false assumptions Why do language models accept false assumptions they know are wrong? Why do language models struggle with questions containing false assumptions?. The diagnosis is that they miss conversationally derived presuppositions by design — pattern-matching trigger words can't substitute for tracking the question under discussion Do language models miss presuppositions that arise from context?.

What you might not expect to learn here: this is the same shape as the frame problem. Models stumble not from lacking world knowledge but from failing to bring the right unstated background conditions forward as relevant — and forcing explicit enumeration of those preconditions jumps accuracy from 30% to 85% Do language models fail at identifying unstated preconditions?. 'Which background conditions matter right now?' and 'what counts as presupposed right now?' turn out to be the same question, and the QUD is what answers both. Presupposition isn't a property of sentences sitting in isolation — it's a property of sentences relative to what's being asked.

Sources 7 notes

Does projection strength vary by context or by word type?

Across 19 English expressions, projectivity varies continuously based on whether content addresses the Question Under Discussion. The same presupposition trigger projects more or less depending on context, not on fixed lexical properties.

Do language models miss presuppositions that arise from context?

LLMs learn statistical associations between trigger words and inferences, but presuppositions also arise through accommodation—updating context to resolve discourse mismatches. Models miss these because they require tracking questions under discussion, not pattern matching.

Why are presuppositions more persuasive than direct assertions?

Experimental evidence shows presuppositions with additive, iterative, and factive triggers persuade audiences more than assertions, especially for discourse-new content. The mechanism: presuppositions bypass evaluative scrutiny by presenting claims as already-accepted background.

Why do embedding contexts confuse LLM entailment predictions?

LLMs treat presupposition triggers and non-factive verbs as surface cues rather than computing their opposite semantic effects on entailments. This structural failure persists across prompts and models, suggesting models rely on surface patterns instead of structural analysis.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Why do language models struggle with questions containing false assumptions?

The (QA)2 benchmark found that zero-shot LLMs halve their performance when questions contain false or unverifiable assumptions compared to valid questions. Even top models reached only 56% acceptability, and the gap persists despite model scaling, suggesting false presuppositions embedded in plausible language are systematically difficult to reject.

Do language models fail at identifying unstated preconditions?

LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a conversational semanticist tracking what counts as presupposed in LLM reasoning. The question: does the Question Under Discussion (QUD)—the implicit conversational goal—truly control presupposition projection in language models, or do newer capabilities/training regimes now allow models to compute presupposition against discourse context rather than surface triggers alone?

What a curated library found — and when (2019–2025, spanning early persuasion work through recent grounding studies):
• Presupposition projection is gradient, not binary: content projects to the degree it doesn't address the live QUD, not by fixed trigger-word properties (~2023).
• LLMs treat presupposition triggers as surface blinds, failing to compute them against discourse; GPT-4 rejects false presuppositions only ~84% of the time, halving performance on questions carrying false assumptions (~2025).
• Models systematically accommodate false presuppositions even when they possess the facts, because they pattern-match triggers rather than track conversational context (~2023–2025).
• This mirrors the frame problem: forcing explicit enumeration of unstated preconditions/background conditions jumps accuracy from ~30% to ~85% (~2024).
• Recent work suggests in-context reasoning and skills-in-context prompting may unlock compositionality; grounding and abstention studies hint at emerging sensitivity to unanswerable/loaded questions (~2024–2025).

Anchor papers (verify; mind their dates):
• arXiv:2305.14785 (2023-05): "Simple Linguistic Inferences of LLMs: Blind Spots and Blinds"
• arXiv:2505.22354 (2025-05): "LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High"
• arXiv:2506.09038 (2025-06): "AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions"
• arXiv:2508.12863 (2025-08): "Word Meanings in Transformer Language Models"

Your task:
(1) RE-TEST: For each constraint above—trigger-blindness, false-presupposition accommodation, performance collapse under loaded questions—check whether chain-of-thought, retrieval-augmented generation, explicit QUD annotation in prompts, or multi-turn clarification now let models track conversational context and suppress accommodation. Separate what's still true (e.g., raw surface pattern-matching) from what newer orchestration may have relaxed. Cite what resolved it.
(2) Surface the strongest CONTRADICTING work from ~6 months prior to now. Does any recent paper show models *do* compute presupposition against discourse, or that trigger-blindness was an artifact of evaluation setup?
(3) Propose two questions that assume the regime shifted: (a) If models can now track QUD under the right prompting, what's the minimal cognitive overhead? (b) Does grounding-on-unanswerable questions require presupposition-rejection, or are they separate failures?

Cite arXiv IDs; flag anything you cannot ground.

How does the Question Under Discussion shape what counts as presupposed?

Sources 7 notes

Next inquiring lines