Why does frame-activation matter more than word-by-word composition?

This explores why human meaning-making seems to work by activating whole interpretive frames rather than assembling meaning one word at a time — and what that says about how language models, which predict form-to-form, handle composition.

This reads the question as a contrast between two theories of how meaning gets built: a bottom-up, word-by-word composition model versus a frame-activation model where the mind locks onto a coherent interpretive frame and lets that frame govern which words matter. The corpus comes down firmly on the side of frames — and the most direct evidence is that the mind doesn't weight words by how often they co-occur, but by whether they belong to the frame already in play. Does the mind selectively activate frames from only some words? shows the mind holds frame-related words in tight resonance while actively suppressing words that are linguistically adjacent but frame-irrelevant. That selectivity is the whole point: word-by-word composition would treat every nearby word as a contribution, but human meaning-making filters by coherence, an operation plain similarity computation can't reproduce.

Why does this matter more than composition? Because composition assumes meaning accumulates additively, while frame activation says meaning is gated — context decides which words even get to count. You can see the same gating logic appear in unexpected places. Do language models sparsify their activations under difficult tasks? finds that as a task gets harder, a model's activations get sparser in a systematic way, acting as a selective filter rather than a breakdown — a hint that even form-trained systems lean toward frame-like selection under pressure rather than weighing everything equally.

The deeper stakes show up when you ask whether form alone can ever get you to frames. Can language models learn meaning from text patterns alone? argues meaning lives in the relation between expressions and communicative intent, which pure form-to-form prediction never touches. The opposing view, Can language models learn meaning without engaging the world?, counters that compressing relational structure from text is enough to reproduce situated discourse — that the frame is latent in the relations between words and doesn't need an external referent. The disagreement is really about whether frames can be recovered from form, or whether they require something form can't carry.

There's also a failure mode that exposes the cost of getting frames wrong. Why do language models fail in gradually revealed conversations? shows models lock into an early interpretive frame and can't recover when later turns contradict it — a 39% performance drop that mitigations barely dent. That's frame activation gone rigid: once the wrong frame fires, word-by-word information arriving afterward can't override it. The mirror image appears in Why do language models ignore information in their context?, where strong parametric priors act as a pre-loaded frame that drowns out the actual context — and only intervening directly in the representations, not adding more words to the prompt, fixes it.

What you might not have expected: composition itself may be implemented frame-style under the hood. Do neural networks naturally learn modular compositional structure? finds networks build isolated subnetworks for each sub-function, and Does depth matter more than width for tiny language models? shows depth wins because layers compose abstract concepts rather than spreading them across width. So even where composition happens, it looks less like adding words and more like activating and chaining the right structured units — which is frame activation by another name.

Sources 8 notes

Does the mind selectively activate frames from only some words?

Human meaning-making operates through selective frame activation: the mind holds frame-related words in tight resonance while ignoring linguistically adjacent but frame-unrelated words. This selectivity tracks frame-coherence, not co-occurrence frequency, and represents a cognitive operation that standard similarity computation cannot capture.

Do language models sparsify their activations under difficult tasks?

As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

Why do language models fail in gradually revealed conversations?

Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Does depth matter more than width for tiny language models?

MobileLLM shows deep-and-thin architectures yield 2.7–4.3% accuracy gains over balanced designs at 125M–350M scale by composing abstract concepts through layers rather than spreading parameters across width.

Why does frame-activation matter more than word-by-word composition?

Sources 8 notes

Next inquiring lines