SYNTHESIS NOTE
Language, Text, and Discourse Conversational AI and Personalization

Can language models adapt implicature to conversational context?

Do large language models flexibly modulate scalar implicatures based on information structure, face-threatening situations, and explicit instructions—as humans do? This tests whether pragmatic computation is truly context-sensitive or merely literal.

Synthesis note · 2026-02-21 · sourced from Linguistics, NLP, NLU
Where exactly do LLMs break down with language structure? How should researchers navigate LLM reasoning research?

Scalar implicatures are a core pragmatic phenomenon: when someone says "some," it typically implies "not all." This is not semantically entailed but pragmatically inferred based on the maxim of quantity — if all were true, the speaker would have said "all." Human computation of these implicatures is sensitive to communicative context in documented ways.

Three experiments from Pragmatic Implicature Processing in ChatGPT (Ruytenbeek et al. 2024) tested whether ChatGPT shows human-like context-sensitivity in implicature. All three failed:

Generalized conversational implicatures: Humans can inhibit implicature computation when explicitly instructed to interpret utterances literally. ChatGPT failed to show this distinction — it doesn't switch between pragmatic and semantic processing modes.

Information structure sensitivity: For scalar implicatures, humans compute more "some but not all" inferences when the scalar term is in the information focus (the direct answer to an explicit question) than when it is in the background. ChatGPT showed no sensitivity to information structure.

Face context: Human scalar implicature rates differ between face-threatening and face-boosting contexts. If a poem is being evaluated and someone says "some people loved it," the implicature ("not all loved it") is more prominent in face-boosting contexts. ChatGPT showed no differential response to face context.

These are not exotic phenomena. They are the basic flexibility that allows human conversation to be more than literal string exchange. Pragmatic competence requires tracking the communicative context — who is asking, why, what stakes are involved — and modulating interpretation accordingly. ChatGPT's failure is not isolated to edge cases; it extends to routine context-modulation effects that appear in any human conversation.

A complementary finding in non-literal language: GPT-4o significantly overestimates irony likelihood in emojis compared to human perception (median irony scores significantly higher, W = 918.5, p < .001). When prompted to rate the likelihood of specific emojis being used ironically, GPT-4o considers the same emojis more likely to express irony than humans do — possibly due to disproportionate representation of ironic emoji usage in training data. Demographic information in prompts does not substantially change GPT-4o's irony classification. This parallels the implicature failure: the model cannot calibrate to actual human pragmatic norms for non-literal communication, whether the signal is scalar implicature or visual irony.

Inquiring lines that use this note as a source 26

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 135 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

llm scalar implicature computation fails to adapt to communicative context