Why do language models substitute parametric knowledge over retrieved context mid-reasoning?

This explores why models fall back on knowledge baked in during training instead of the documents you actually hand them in context — even partway through a reasoning chain.

This explores why models fall back on knowledge baked in during training instead of the documents you hand them in context, even mid-reasoning. The corpus points to one blunt mechanism underneath: a model's pull toward what it already 'knows' is not a soft preference you can talk it out of, it's a structural default. When prior training associations are strong, in-context information loses — and the research shows textual prompting alone can't override that pull; only direct causal intervention in the model's internal representations reliably flips the behavior Why do language models ignore information in their context?. So the substitution isn't the model being lazy; it's the parametric prior winning a tug-of-war the context was never weighted to win.

Why does the prior win so reliably? Because the model isn't reasoning over your context the way you assume. When the actual semantic content of a task is decoupled from the logical structure, LLM accuracy collapses even when the correct rules are sitting right there in the prompt — they lean on parametric commonsense and learned token associations rather than manipulating the symbols in front of them Do large language models reason symbolically or semantically?. A sharper version of the same finding: models predict entailment based on whether a claim looks attested in training data, not on whether the premise you gave them actually supports it Do LLMs predict entailment based on what they memorized?. The retrieved context is treated less as ground truth and more as a weak suggestion competing against memorized propositions.

The failure compounds when the context contains something the model has a strong opinion about. Even when a model demonstrably knows the correct fact, it will quietly accept a false presupposition smuggled into the input rather than reject it — accommodation beats correction by a wide margin Why do language models accept false assumptions they know are wrong?. That's the same dynamic as parametric override, just pointed the other way: the model's bias is toward going along with whatever framing dominates, and the trained prior usually dominates the retrieved snippet.

There's also a ceiling worth naming. You can't fix this by getting cleverer with prompts, because prompting only reorganizes knowledge already inside the training distribution — it can't inject what isn't there Can prompt optimization teach models knowledge they lack?. If the context carries genuinely new information, the model has no internal anchor for it, which is exactly when it's most tempted to substitute the familiar-sounding parametric answer. The most promising counter the corpus offers isn't better prompting at all — it's making retrieval a learned decision: DeepRAG frames each reasoning step as a choice of retrieve-versus-recall and trains the model on when to trust which, recovering a ~22% accuracy gain by switching deliberately instead of defaulting When should language models retrieve external knowledge versus use internal knowledge?.

The thing you didn't know you wanted to know: the substitution often happens silently and early. Transformers can compute an answer in their first few layers and then overwrite it before the visible output Do transformers hide reasoning before producing filler tokens? — so 'mid-reasoning' the contest between context and prior may already be settled beneath the tokens you ever see, which is why arguing with the model in the prompt so rarely moves it.

Sources 7 notes

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Do LLMs predict entailment based on what they memorized?

McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

When should language models retrieve external knowledge versus use internal knowledge?

DeepRAG models each reasoning step as a Markov Decision Process where the model learns when to retrieve versus rely on parametric knowledge. The 21.99% improvement comes from better-targeted retrieval and elimination of noise from unnecessary external knowledge.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Why do language models substitute parametric knowledge over retrieved context mid-reasoning?

Sources 7 notes

Next inquiring lines