Why do AI systems miss jokes and wordplay so consistently?
Exploring whether AI's literal reading of language stems from how transformers process tokens in parallel rather than through selective frame-activation like humans do. Understanding this gap could reveal what cognitive operations current architectures lack.
The transformer architecture processes a sequence of tokens through attention layers that compute relations across all token pairs. Information about the words is integrated, but the integration is parallel and additive — every token influences every other in proportion to the attention weights. There is no cognitive operation that suppresses some attention paths in order to surface the frame that holds a subset of tokens together. The mechanism does not do selective-resonance; it does weighted-aggregation.
This explains a recurring AI failure pattern. Given material that contains a frame activated by some words but not others, AI tends to read the material literally — taking each word at its compositional value rather than catching the frame the subset activates. The bullseye example illustrates: given "bullseye" applied to a design with a dot, a cover, and an arrow through it, AI reads "bullseye" as compliment-metaphor and misses the archery frame three of the four words activate. The miss is structural, not a knowledge gap. AI knows what "bullseye" is, knows what "arrow" is, knows what "dot" is. What it does not do is select these three for frame-activation while suppressing "cover."
This generalizes beyond wordplay. The same mechanism underlies AI difficulties with jokes (the punchline activates a frame that recontextualizes the setup), with poetry (image-clusters activate frames the literal words do not), with rhetoric (where a frame is built from selective material across a passage). Each of these depends on selective-resonance — the operation transformers do not perform. The miss is not "AI lacks world knowledge"; it is "AI lacks the selective-suppression operation that frame-activation requires."
Does the mind selectively activate frames from only some words? is the human-side companion. Together the two claims locate the difference precisely: not that AI lacks data or context, but that the cognitive operation human meaning-making relies on is not the operation transformers perform.
The strongest counterargument: better attention mechanisms, finer-grained attention heads, and explicit frame-extraction layers could close the gap. Possible but not yet evident. The gap appears even in the largest models with the most sophisticated attention, which suggests the operation needed is not just better attention but a different operation. Selective frame-activation may require something architecturally distinct from attention-as-weighted-aggregation.
Inquiring lines that use this note as a source 49
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How does the temporal structure of attention differ between humans and AI?
- Can better attention mechanisms close the gap between human and AI frame-activation?
- What is selective resonance and why do transformers not perform it?
- Does AI struggle with poetry for the same reason it misses jokes?
- Can AI detect sense-of-nonsense the way human readers do?
- Why does transformer attention architecture reinforce sycophancy and agreement?
- Does transformer attention architecture fundamentally prevent topic-aware memory?
- Can transformer attention architecture explain why chatbots default to sycophancy?
- How do anthropomimetic design features trigger System 1 cognitive traps?
- Can symbolic mechanisms improve transformer compositional abilities?
- Why does attention-based drift happen automatically during generation?
- Does transformer attention architecture systematically bias models toward sycophancy?
- How do multimodal AI architectures compare to human brain export pathways?
- Why do transformers weight early tokens more heavily than later ones?
- Why does mimicking human behavior differ from simulating human cognition?
- How does circuit complexity limit which grammatical structures transformers can acquire?
- Why does AI struggle with wordplay when it has access to word embeddings?
- How do humans detect which words belong to the same frame together?
- What specific signals would be needed for an AI system to acquire meaning?
- Why do transformer attention patterns show positional and sequential bias across tasks?
- How does the U-shaped attention distribution relate to transformer sycophancy?
- How does transformer attention amplify pressure from repeated false claims?
- Why do language models overestimate irony likelihood in emoji use?
- What role does Peirce's semiotic framework play in understanding AI meaning?
- What hidden computations happen inside transformer layers during reasoning?
- Do metaphors work by decoupling meaning from linguistic associations?
- Why do different readers extract different meanings from identical text?
- What specific cognitive failure prevents AI from detecting frame activation?
- Does transformer attention architecture inherently bias models toward sycophancy?
- Why do different brain and AI systems appear similar when compared via RSA?
- Why do AI systems skip repair sequences that humans use constantly?
- Can a system without an addressee ever truly tell a joke?
- Why does AI criticism fail where human literary analysis succeeds?
- How does human intuition about cognition mislead AI evaluation?
- Why does joint attention matter for acquiring linguistic meaning?
- Why does AI output lack the argumentative turbulence of human thinking?
- What explains the contextual variability of knowledge in transformers?
- How does oral transmission of knowledge resemble transformer generation?
- Why does context work differently in AI than in conventional software?
- How do attention patterns and circuits function as algorithmic representations?
- Does AI's atemporal processing explain its preference for linear plots?
- How does the task type change which linguistic features distinguish AI from humans?
- Why does AI writing sound human while failing lexical measurements?
- Can AI detection work without computational analysis of word distribution?
- Does next-token prediction actually explain how human thought works?
- How does transformer attention bias toward repeated and context-prominent content?
- How do agents parse HTML differently than human browsers render it?
- What structural biases does transformer attention have before training?
- Where does the meaning actually originate in reader-detected resonance across language?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does the mind selectively activate frames from only some words?
When we understand wordplay or jokes, do we activate a frame from a subset of available words while suppressing nearby but frame-unrelated words? This matters because it reveals how meaning-making differs from how AI processes language.
companion human-side claim
-
How do readers actually build meaning from words?
Does meaning come from adding up word definitions, or from detecting which words activate the same mental frame together? This explores whether composition or resonance better describes how we make sense of language.
the broader theoretical claim
-
Why don't conversational AI systems mirror their users' word choices?
Explores whether current dialogue models exhibit lexical entrainment—the human tendency to align vocabulary with conversation partners—and what's needed to bridge this gap in AI communication.
adjacent failure mode in AI's handling of conversational meaning
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Beyond Hallucinations: The Illusion of Understanding in Large Language Models
- A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
- On the Reasoning Capacity of AI Models and How to Quantify It
- Evaluating Large Language Models in Theory of Mind Tasks
- Implicit Chain of Thought Reasoning via Knowledge Distillation
- Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens
- Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs
- Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Original note title
AI reads words literally one-at-a-time missing the frame that multiple words activate together