Can language models adapt implicature to conversational context?

Do large language models flexibly modulate scalar implicatures based on information structure, face-threatening situations, and explicit instructions—as humans do? This tests whether pragmatic computation is truly context-sensitive or merely literal.

Synthesis note · 2026-02-21 · sourced from Linguistics, NLP, NLU

Scalar implicatures are a core pragmatic phenomenon: when someone says "some," it typically implies "not all." This is not semantically entailed but pragmatically inferred based on the maxim of quantity — if all were true, the speaker would have said "all." Human computation of these implicatures is sensitive to communicative context in documented ways.

Three experiments from Pragmatic Implicature Processing in ChatGPT (Ruytenbeek et al. 2024) tested whether ChatGPT shows human-like context-sensitivity in implicature. All three failed:

Generalized conversational implicatures: Humans can inhibit implicature computation when explicitly instructed to interpret utterances literally. ChatGPT failed to show this distinction — it doesn't switch between pragmatic and semantic processing modes.

Information structure sensitivity: For scalar implicatures, humans compute more "some but not all" inferences when the scalar term is in the information focus (the direct answer to an explicit question) than when it is in the background. ChatGPT showed no sensitivity to information structure.

Face context: Human scalar implicature rates differ between face-threatening and face-boosting contexts. If a poem is being evaluated and someone says "some people loved it," the implicature ("not all loved it") is more prominent in face-boosting contexts. ChatGPT showed no differential response to face context.

These are not exotic phenomena. They are the basic flexibility that allows human conversation to be more than literal string exchange. Pragmatic competence requires tracking the communicative context — who is asking, why, what stakes are involved — and modulating interpretation accordingly. ChatGPT's failure is not isolated to edge cases; it extends to routine context-modulation effects that appear in any human conversation.

A complementary finding in non-literal language: GPT-4o significantly overestimates irony likelihood in emojis compared to human perception (median irony scores significantly higher, W = 918.5, p < .001). When prompted to rate the likelihood of specific emojis being used ironically, GPT-4o considers the same emojis more likely to express irony than humans do — possibly due to disproportionate representation of ironic emoji usage in training data. Demographic information in prompts does not substantially change GPT-4o's irony classification. This parallels the implicature failure: the model cannot calibrate to actual human pragmatic norms for non-literal communication, whether the signal is scalar implicature or visual irony.

Inquiring lines that use this note as a source 26

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 135 in 2-hop network ·dense cluster Open in graph ↗

Can language models adapt implicature to convers… Why does ChatGPT fail at implicit discourse relati… Why do language models fail at communicative optim… Why do speakers need to actively calibrate shared …

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why does ChatGPT fail at implicit discourse relations? ChatGPT excels when discourse connectives are present but drops to 24% accuracy without them. What does this gap reveal about how LLMs actually process meaning and logical relationships?
scalar implicatures are implicit inferences; extends this insight
Why do language models fail at communicative optimization? LLMs excel at learning surface statistical patterns from text but struggle with deeper principles of how language achieves efficient communication. What distinguishes these two types of linguistic knowledge?
implicature computation is a communicative optimization principle not captured by distribution
Why do speakers need to actively calibrate shared reference? Explores whether using the same words guarantees speakers mean the same thing. Investigates how referential grounding differs across people and what collaborative work is needed to establish true understanding.
context-sensitivity in implicature is part of the calibration that LLMs skip

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

llm scalar implicature computation fails to adapt to communicative context

Can language models adapt implicature to conversational context?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4