Why do language models ignore information in their context?

Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.

Synthesis note · 2026-02-21 · sourced from Discourses

The REMEDI paper names a specific failure mode: "failure of context integration." The example: an LM is prompted with a context establishing that Anita works in a law office, but when generating a continuation, the LM describes Anita as a nurse — overriding the contextual information with a prior association (names like Anita may statistically co-occur with certain occupations in training data).

This is a named, empirically documented failure mode, not a hypothetical. The failure occurs because the LM's parametric knowledge (compressed into weights from training) and its in-context information (the prompt) are not cleanly integrated. When they conflict, the parametric association can win.

The implication is important for how we think about context windows and RAG-style augmentation. Just providing information in context does not guarantee that a model will use it. If the information conflicts with strong prior associations, the prior may dominate — not because the model misread the context, but because context integration is not a lossless operation. The provided information gets processed through the same mechanisms that already have strong priors.

Fixing this requires causal intervention, not just better prompting: you need to modify the representations that carry the prior association, not just add more context on top of them. This is what REMEDI demonstrates — that adding a learned vector directly to entity representations can override the prior in a way that textual prompting cannot.

Inquiring lines that use this note as a source 303

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

20 direct connections · 251 in 2-hop network ·dense cluster Open in graph ↗

Why do language models ignore information in the… Do language models actually use their encoded know… Do classical knowledge definitions apply to AI sys… Do language models actually build shared understan…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do language models actually use their encoded knowledge? Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
the complementary failure: even information that IS correctly encoded may not causally influence output
Do classical knowledge definitions apply to AI systems? Classical definitions of knowledge assume truth-correspondence and a human knower. Do these assumptions hold for LLMs and distributed neural knowledge systems, or do they need fundamental revision?
context integration failure is part of why "LLM knowledge" is not propositional knowledge
Do language models actually build shared understanding in conversation? When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
the conversational consequence: context integration failure at the representational level surfaces as presumption of common ground at the communicative level — both reflect the same absence of bidirectional grounding

Why do language models ignore information in their context?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4