How do humans detect which words belong to the same frame together?

This explores how the human mind decides which words in a sentence belong together as one coherent unit of meaning (a 'frame') — and why that grouping isn't just about which words tend to appear near each other.

This explores how the human mind decides which words belong together as one coherent unit of meaning — and the corpus's answer is surprisingly counterintuitive: humans don't group words by counting which ones show up near each other, they group by *resonance*. The mind holds frame-related words in tight mutual activation while actively suppressing words that are linguistically adjacent but frame-irrelevant Does the mind selectively activate frames from only some words?. The key move is selection plus suppression, not addition. Meaning, on this view, is the live detection of which subsets of words light up a shared frame — a selective, non-additive, non-monotonic operation rather than a sum of individual word meanings How do readers actually build meaning from words?.

The sharpest way to see what this human ability *is* turns out to be looking at a system that lacks it. Transformers read words additively: they aggregate token information through weighted parallel attention, with no mechanism for selectively suppressing the irrelevant ones. That structural gap — not missing knowledge — is why AI consistently misses jokes, puns, and wordplay, where the whole effect depends on which two or three words are supposed to resonate while the rest fall away Why do AI systems miss jokes and wordplay so consistently?. So one answer to 'how do humans detect frame membership' is: by doing exactly the thing attention architectures don't — gating words in and out rather than averaging them all.

Frame detection also isn't a single operation happening in isolation. Discourse research suggests humans track three layers simultaneously while reading — the linguistic segments, the speaker's intentions, and what's currently most salient in attention — and these layers constrain each other in parallel, not in sequence How do readers track segments, purposes, and salience together?. Which words belong to a frame depends partly on what the reader judges the passage to be *doing*, so frame membership is shaped top-down by purpose and attention, not just bottom-up by the words on the page.

That top-down pressure is also why the 'same frame' can differ between readers. The corpus shows that interpretations of socially loaded sentences are irreducibly multiple — different readers genuinely activate different frames depending on social position, and this disagreement is signal, not annotation noise Why do readers interpret the same sentence so differently?. Relatedly, deliberately ambiguous text requires holding two frames at once, which humans do at ~90% while GPT-4 manages 32% Can language models recognize when text is deliberately ambiguous?. Frame detection isn't just grouping the right words — it's sometimes recognizing that two valid groupings coexist.

The doorway worth walking through: if you want a glimpse of the geometry underneath, LLM embeddings turn out to organize meaning along only about three human-like evaluation dimensions, where nudging one feature predictably drags aligned ones along — a hint that the 'frames' words fall into may sit in a low-dimensional, entangled structure rather than a clean dictionary of separate concepts Do LLM semantic features organize along human evaluation dimensions?. The human knack for frame detection may be less about knowing word meanings and more about navigating that resonance space in real time.

Sources 7 notes

Does the mind selectively activate frames from only some words?

Human meaning-making operates through selective frame activation: the mind holds frame-related words in tight resonance while ignoring linguistically adjacent but frame-unrelated words. This selectivity tracks frame-coherence, not co-occurrence frequency, and represents a cognitive operation that standard similarity computation cannot capture.

How do readers actually build meaning from words?

Meaning-making is the live detection of which word subsets activate shared frames, not compositional aggregation of individual word meanings. This operation is selective, non-additive, and non-monotonic, fundamentally different from how current AI processes language.

Why do AI systems miss jokes and wordplay so consistently?

Transformers integrate token information through weighted parallel aggregation rather than selective suppression of irrelevant words. This structural difference explains consistent failures with jokes, wordplay, and frame-dependent meaning—not knowledge gaps, but missing cognitive operations.

How do readers track segments, purposes, and salience together?

Discourse processing demands parallel recognition of linguistic segments, intentional structure, and attentional salience—not sequential processing. These three layers constrain each other during comprehension, and failures in any single layer disrupt overall understanding.

Why do readers interpret the same sentence so differently?

Interpretation Modeling research shows that disagreement on socially embedded sentences reflects valid differences in reader perspective, not annotation failure. Structured human disagreement in NLI benchmarks confirms that interpretation distributions carry meaningful information.

Can language models recognize when text is deliberately ambiguous?

AMBIENT benchmark shows GPT-4 correctly disambiguates only 32% of cases versus 90% for humans. This failure spans lexical, structural, and scope ambiguity—revealing that LLMs cannot hold multiple interpretations simultaneously, a fundamental gap hidden by standard benchmarks.

Do LLM semantic features organize along human evaluation dimensions?

Twenty-eight semantic axes in LLM embeddings reduce to three principal components matching human EPA structure. Intervening on one feature predictably shifts aligned features proportionally, creating unavoidable off-target effects that reflect how meaning is fundamentally organized.

How do humans detect which words belong to the same frame together?

Sources 7 notes

Next inquiring lines