How does surface salience compete with background knowledge in model inference?

This explores the tug-of-war between what's prominent on the surface of a prompt (memorized phrasings, stated assumptions, familiar-sounding claims) and what the model 'knows' from training — and which one wins when they disagree.

This explores the tug-of-war between surface salience — what's prominent in the text in front of the model — and background knowledge baked in from training, and which one steers the answer when they conflict. The corpus is interesting precisely because it shows the competition cuts both ways, and neither side reliably wins.

In one direction, baked-in priors steamroll the surface. Why do language models ignore information in their context? shows models generating answers that contradict their own context whenever the training-time association is strong enough — and that plain prompting can't fix it; you have to intervene in the representations themselves. Do large language models reason symbolically or semantically? sharpens the why: when you decouple a task's semantics from its logic, performance collapses even with the correct rule sitting right there in the prompt. The model is leaning on familiar token associations, not the rule it was handed.

But flip the framing and the surface wins instead. Why do language models accept false assumptions they know are wrong? is the cleanest case: ask a model directly and it knows the fact, but bury a false assumption in the phrasing of a question and it goes along with it — false presuppositions drive more accommodation than correct knowledge drives rejection. Do LLMs predict entailment based on what they memorized? is the same pattern in logic's clothing: a model will call something 'entailed' just because the conclusion looks like something it saw in training (it's 'attested'), even when the premise is random noise. Here a salient, familiar-looking string overrides what the model actually knows about the relationship.

So the real answer to 'who wins' is: whichever signal is more confident, not whichever is more correct. Strong parametric priors beat weak context; salient familiar phrasings beat weakly-held knowledge. That reframes the competition as a calibration problem rather than a knowledge problem — which is why Can models learn to ignore irrelevant prompt changes? matters: it trains models to give the same answer to clean and cosmetically-altered prompts, blunting surface salience's grip directly. And it explains why surface-level prompt tricks have a hard ceiling — Can prompt optimization teach models knowledge they lack? shows prompting can only reorganize what's already in the weights, never supply what's missing.

The doorway worth walking through: the kind of knowledge changes how the contest plays out. Does procedural knowledge drive reasoning more than factual retrieval? finds that reasoning rides on broad, transferable procedural patterns while factual recall depends on narrow, document-specific memorization. That suggests surface salience hijacks the brittle, memorized facts most easily — the place where the model's 'knowledge' is really just a remembered string — while genuinely procedural competence is harder for a salient distractor to knock over.

Sources 7 notes

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Do LLMs predict entailment based on what they memorized?

McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.

Can models learn to ignore irrelevant prompt changes?

Two methods—BCT (output-level) and ACT (activation-level)—train models to respond identically to clean and wrapped prompts by using the model's own clean responses as targets, eliminating specification and capability staleness inherent in standard SFT.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher auditing claims about surface salience vs. background knowledge in model inference. The question remains: when surface-level cues and parametric priors conflict, what determines which steers the model's output?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat them as perishable constraints:

• Models generate answers that contradict their own context when training-time associations are strong; plain prompting cannot fix this — representation-level intervention is needed (~2024).
• When task semantics are decoupled from logic, performance collapses even with the correct rule in the prompt; models lean on token associations, not symbolic reasoning (~2023).
• False presuppositions in phrasing drive more accommodation than correct background knowledge drives rejection; salient, familiar-looking strings override weakly-held knowledge (~2024–2025).
• Consistency training (teaching perturbation invariance) blunts surface salience's grip by training models to ignore cosmetic prompt changes (~2025).
• Procedural knowledge generalizes broadly across reasoning; factual recall is narrow and memorization-dependent, suggesting surface salience hijacks brittle, memorized facts more easily (~2024–2025).

Anchor papers (verify; mind their dates):
• arXiv:2305.14825 (2023) — semantic reasoning vs. symbolic
• arXiv:2411.12580 (2024) — procedural knowledge in pretraining
• arXiv:2510.27062 (2025) — consistency training and sycophancy
• arXiv:2506.08952 (2025) — grounding under loaded questions

Your task:

(1) RE-TEST EACH CONSTRAINT. For the five findings above, judge whether newer models (o1, Claude 4, GPT-5 reasoning), test-time scaling (chain-of-thought, scratchpad), or architectural advances (mixture-of-experts, retrieval-augmented generation) have since relaxed or resolved the competition. Separate: which tension is still live (e.g., does the semantic/symbolic gap persist?) from which is solved (e.g., has consistency training become standard?). Cite what resolved it.

(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially anything showing surface salience no longer reliably hijacks models, or showing parametric priors are now more easily overridden by context.

(3) Propose 2 research questions that ASSUME the regime may have moved: one assuming surface salience's grip has loosened; one assuming procedural knowledge can now be reliably steered by context.

Cite arXiv IDs; flag anything you cannot ground in a real paper.

How does surface salience compete with background knowledge in model inference?

Sources 7 notes

Next inquiring lines