INQUIRING LINE

Can frame semantics explain why context matters more than word similarity?

This reads 'frame semantics' as the idea that a word's meaning is fixed by the context/situation it sits in rather than its surface resemblance to other words — and asks whether that theory accounts for why context should beat word-level similarity. The corpus doesn't name frame semantics directly, but it maps the exact battleground: when do language models follow context versus surface form?


This explores whether the linguistic intuition behind frame semantics — meaning comes from the situation a word sits in, not from how textually similar it is to other words — is borne out in how language models actually behave. The honest answer from the corpus is a twist: it documents in detail *why context often loses to surface similarity*, which is the failure frame semantics is supposed to prevent. So rather than confirming the theory, the collection shows you the gap between what context-driven meaning should do and what models default to.

The starkest evidence runs against context. Models systematically prefer high-frequency phrasings over semantically identical rare paraphrases, across math, translation, and reasoning — they're tracking statistical mass from pretraining, not the meaning a frame would assign Do language models really understand meaning or just surface frequency?. And when context conflicts with strong prior associations baked in during training, the priors win: prompting alone can't make a model honor what's in front of it, because parametric knowledge overrides the in-context signal Why do language models ignore information in their context?. That priming effect is even predictable from a keyword's pre-training probability Can we predict keyword priming before learning happens?. If frame semantics says context should dominate, these notes show the architecture often pulls the other way.

Where the corpus comes closest to the frame-semantic mechanism is the most interesting place to go next. Models treat presupposition triggers and non-factive verbs — exactly the words whose meaning *flips* depending on the surrounding frame ('pretended to,' 'forgot that') — as flat surface cues, failing to compute their opposite effects on what's entailed Why do embedding contexts confuse LLM entailment predictions?. That's a clean demonstration that the models are doing word-similarity matching where a frame would force a structural reinterpretation. The breakdown isn't random either: reasoning fails at instance-novelty boundaries, because models fit patterns from similar examples rather than the generalizable structure a frame provides Do language models fail at reasoning due to complexity or novelty?.

The deeper theory layer reframes why this happens at all. One line argues LLMs operationalize Saussure's *langue* — meaning purely as relational position among words, learned by compressing text with no external referent Can language models learn meaning without engaging the world?. That is, in a sense, word-similarity *as* meaning. Against it, Bender and Koller's argument that form alone can't yield meaning because meaning needs the relation between expressions and communicative intent Can language models learn meaning from text patterns alone? — and the counter-evidence that static embeddings already encode rich semantic content like valence and concreteness before attention even runs Do transformer static embeddings actually encode semantic meaning?. Read together, these stake out whether context-meaning is something models construct or something they only approximate through relational similarity.

The thing worth walking away knowing: frame semantics predicts context *should* override surface similarity, but this collection's strongest finding is that current models frequently do the reverse — and the cases where they fail (embedding-blind verbs, novel instances, frequency preference) are precisely the cases a genuine frame would have caught. The theory explains what's missing more than it explains the models.


Sources 8 notes

Do language models really understand meaning or just surface frequency?

LLMs show consistent preference for higher-frequency surface forms over semantically equivalent rare paraphrases across math, machine translation, commonsense reasoning, and tool calling. This suggests models track statistical mass from pretraining rather than meaning-recognition as their primary mechanism.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can we predict keyword priming before learning happens?

Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.

Why do embedding contexts confuse LLM entailment predictions?

LLMs treat presupposition triggers and non-factive verbs as surface cues rather than computing their opposite semantic effects on entailments. This structural failure persists across prompts and models, suggesting models rely on surface patterns instead of structural analysis.

Do language models fail at reasoning due to complexity or novelty?

LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Do transformer static embeddings actually encode semantic meaning?

Clustering analysis of RoBERTa embeddings reveals sensitivity to five psycholinguistic measures including valence, concreteness, iconicity, and taboo. This demonstrates that static embeddings function as genuine lexical entries containing semantic content before self-attention operates.

Next inquiring lines