Why do language models ignore information in their context?
Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.
The REMEDI paper names a specific failure mode: "failure of context integration." The example: an LM is prompted with a context establishing that Anita works in a law office, but when generating a continuation, the LM describes Anita as a nurse — overriding the contextual information with a prior association (names like Anita may statistically co-occur with certain occupations in training data).
This is a named, empirically documented failure mode, not a hypothetical. The failure occurs because the LM's parametric knowledge (compressed into weights from training) and its in-context information (the prompt) are not cleanly integrated. When they conflict, the parametric association can win.
The implication is important for how we think about context windows and RAG-style augmentation. Just providing information in context does not guarantee that a model will use it. If the information conflicts with strong prior associations, the prior may dominate — not because the model misread the context, but because context integration is not a lossless operation. The provided information gets processed through the same mechanisms that already have strong priors.
Fixing this requires causal intervention, not just better prompting: you need to modify the representations that carry the prior association, not just add more context on top of them. This is what REMEDI demonstrates — that adding a learned vector directly to entity representations can override the prior in a way that textual prompting cannot.
Inquiring lines that use this note as a source 303
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How do belief distributions help systems recover from speech recognition errors?
- Why do different language models independently produce similar outputs?
- How does AI lose correct information under conversational persuasive pressure?
- Why do naive baselines outperform trained models in entity-level CRS evaluation?
- Can LLMs propose pivots that change what counts as background context?
- How do training-data priors influence model defaults when context is ambiguous?
- What scaffolding tools help users specify implicit contextual boundaries to models?
- What are Gricean maxims and why do language models violate them?
- Why does context collapse pose risks in high-stakes conversations?
- How do unstated constraints become invisible to training data distributions?
- Why does frame-activation matter more than word-by-word composition?
- How does context collapse affect what language models can meaningfully communicate?
- Why does removing language from its context destroy what makes it work?
- Why does training data saliency distort how models judge meaning?
- How do fixed pragmatic templates prevent models from understanding context?
- Can language models adapt irony detection to specific communicative contexts?
- How does prompt iteration reinforce user bias without empirical anchoring?
- Can prompting techniques reliably force models to enumerate hidden constraints?
- How does surface salience compete with background knowledge in model inference?
- Can prompt-based debiasing overcome entrenched LLM model priors?
- Can prompt design strategies reduce position bias in language model recommendations?
- Can input augmentation and rephrasing compensate for smaller model limitations?
- How do pretraining biases interact differently with prompts across model tiers?
- How does in-context learning trigger phase transitions in model behavior?
- How do position bias and popularity bias interact with sequence order blindness?
- Can prompting strategies eliminate systematic biases without shuffling or aggregation?
- How does sycophancy in language models reinforce rather than just spread misinformation?
- Why does even 0.1 percent poisoned training data persist through alignment?
- Does task superposition explain how models learn from multiple in-context trajectories?
- Why does combining natural language with numerical scores improve prediction accuracy?
- Why do sigmoid conflict curves look the same across different language models?
- What makes a problem instance unfamiliar to a language model?
- Does irrelevant content degrade reasoning even when it fits the context window?
- Why do conversational pivots require explicit re-prompting instead of natural evolution?
- Can explicit numerical signals override learned linguistic defaults in fine-tuned models?
- What makes internal embeddings useful as multimodal input for language model training?
- Can context compression preserve what matters without introducing bias?
- What happens to anaphoric reference when context exceeds the window?
- Can structured prompting reliably force models to enumerate preconditions?
- How do models signal knowledge gaps through token probability?
- What replaces truth-correspondence in probabilistic knowledge representations?
- Can prompting inject new knowledge into already-trained AI models?
- Do language models learn surface patterns instead of underlying linguistic principles?
- Why do reward models trained for accuracy ignore important context about the input?
- Can better prompting fix structural disruptions in artificial text generation?
- Do language models inherit gender bias from training data in grading tasks?
- Can contrastive learning fix the semantic association problem in embeddings?
- Why do pretrained LLM representations fail at task-specific relevance ranking?
- Do language models exhibit the same causal biases that humans show?
- Can implicit linguistic information ever be reliably learned from training data?
- How does prompt context decomposition reveal hidden reward model failures?
- Can autoregressive models be trained to produce more cataphoric text?
- Can language models ground clarifications without vision and kinesthetic modalities?
- How deeply are ideological structures represented in large language models?
- Why do token-level language models fail at utterance-level pragmatic optimization?
- Can language models learn to form ad-hoc conventions through training?
- Why do text-to-image models fail at composing multiple concepts together?
- Does scaling model size solve compositional generalization problems?
- Why do language models fail at planning despite understanding strategies?
- How do early layers preserve unbiased information while late layers conform?
- Why do language models fail when semantic content is stripped away?
- Can autoregressive models learn faithful translation to logical representations without semantic loss?
- How does semantic grounding differ between human minds and language models?
- Why do language models fall back on frequency heuristics under structural complexity?
- Why do autoregressive models fail at controlling syntactic structure and semantic content?
- Can stored conversation context preserve a dormant quasi-subject?
- Can dynamic instance-specific prompt selection solve the generalization problem across tasks?
- Why do language models substitute parametric knowledge over retrieved context mid-reasoning?
- Do language models learn surface patterns that appear generalizable but actually fail under shift?
- Why does selective context retrieval outperform including all historical information?
- Can reranking candidate summaries improve perspective representation better than prompting?
- Can structural perturbations harm model accuracy more than semantic ones?
- How does rhetorical familiarity bias models toward their own arguments?
- What role does entity salience play in detecting incoherence?
- Can transformer attention patterns actually prevent topic context loss in practice?
- Why do language models fail when users switch between and return to topics?
- Why does coreference resolution become implicit in full-transcript prompting?
- How do training data cutoffs produce false claims that stay consistent?
- Why do language models fail at pronouns across distant segments?
- Why do language models fail at coreference across long contexts?
- Why do transformer models still miss implicit discourse relations in anxiety detection?
- How does cross-encoder concatenation capture query-item interactions better than bi-encoders?
- What access constraints allow description-based adaptation but block conventional techniques?
- Can prompting alone inject new domain knowledge into a model?
- Why do embeddings measure semantic association instead of task relevance?
- Do representations in models causally influence text generation?
- Do language models build world models or just task-specific heuristics?
- How does disembedding from social context collapse reliability despite factual accuracy?
- How does prompt context activation differ from parameter-based knowledge injection?
- Can alignment training prevent the clarification work users need?
- Why does hypothesis attestation bias exist separately from frequency bias in NLI?
- Can prompt optimization inject new knowledge into language models?
- Why do pretrained retrievers struggle with ambiguous or implicit queries?
- How does candidate-conditional activation differ from static embedding-based feature crosses?
- Can language models acquire meaning from distributional patterns alone without joint attention?
- How does hidden processing in language models prevent accurate self-assessment?
- Does generalization frequency explain why models favor upward semantic movement?
- Why does NLI fine-tuning amplify frequency bias instead of teaching inference?
- Why do generative and discriminative language model procedures disagree?
- Is confabulation inevitable in large language models regardless of training?
- Why do language models infer political orientation from seemingly innocuous user signals?
- Does irrelevant context degrade reasoning even within model context limits?
- Is paraphrase invariance a reliable assumption when deploying language models in production?
- Can distinctive input voices maintain accuracy without adopting the model's preferred register?
- Why does AI struggle with wordplay when it has access to word embeddings?
- Can frame semantics explain why context matters more than word similarity?
- What role does prompt context play in preventing genuine addressee modeling in generation?
- Does approaching human performance mean learning the same grammatical rules?
- Why do large language models still have systematic blind spots with complex structures?
- Why do language models fail at grounding and inference?
- What reveals the epistemic limits of language models?
- Why does context information fail to override prior training associations?
- Are instruction-tuned models more or less sensitive to prompt semantics than others?
- Why does decoupling retriever and generator training create misalignment?
- How does retrieval-augmented generation create topically redundant content patterns?
- Why does fine-tuning change how models process retrieved context?
- What causes catastrophic forgetting during domain knowledge embedding?
- Why does keyword priming require only three training exposures to establish?
- Does keyword priming explain why pre-training poisoning persists through alignment?
- Can priming from different facts interfere with each other in the same model?
- What mechanism makes keyword probability the strongest predictor of priming?
- How does the symbol grounding problem apply to artificial language systems?
- Can in-context learning substitute for domain-specific training altogether?
- What makes action-producing models fail in ways text models typically do not?
- What causes gradient-based steering via natural language descriptions to work?
- Why do most open language models resist personality conditioning via prompts?
- Does preference optimization training reduce linguistic entrainment in language models?
- Does encoded knowledge in language models actually influence what they generate?
- Can context windows and RAG actually change what language models generate?
- Does encoding information in LM representations guarantee it influences output?
- How would you redesign context integration to prevent prior associations from dominating?
- Why do language models presume common ground instead of establishing it?
- What distinguishes surface cues from structural meaning in language understanding?
- Does fine-tuning on NLI tasks reduce or amplify frequency bias?
- When does encoded knowledge fail to influence language model generation?
- Do language models actively adopt false beliefs under sustained conversational pressure?
- How do personalization errors differ from general accuracy problems in summaries?
- Do models with unfilled memorization capacity appear to generalize falsely?
- Why do language models overestimate irony likelihood in emoji use?
- What communicative optimization principles do language models fail to acquire?
- Do external perspectives fix the self-evaluation bias in language models?
- Do language models calibrate to actual human pragmatic norms?
- How should systems reject queries outside their trained domain?
- Why do language models presume common ground rather than build it?
- How do neural memory modules extend context length beyond attention limits?
- Can neural networks learn that A implies B in reverse?
- Why do language models hallucinate even with perfect training?
- Do language models actually learn linguistic structure or just surface statistics?
- What structural properties of language models make fabrication inevitable?
- Do metaphors work by decoupling meaning from linguistic associations?
- Why might encoded world knowledge fail to actually influence language model outputs?
- Can prompt engineering and external knowledge bases fix ambiguity recognition failures?
- Why does preference optimization reduce grounding behavior in language models?
- Can language models correct false assumptions or only reinforce them?
- Why does fine-tuning fail to remove temporal contamination from pretraining?
- Do language models consistently produce anachronistic output about historical periods?
- Can models detect false presuppositions when they actually possess the knowledge?
- Can explicit linkers replace vector similarity for multi-step question answering?
- How do description-based identifiers bias language model output distribution?
- Does the prediction unit shape what language models actually learn?
- Can models identify what information they are missing in underspecified tasks?
- What causes autoregressive generation to fail on out-of-corpus item identifiers?
- Why do NLP models fail at recognizing multiple valid interpretations?
- Why do language models struggle with context-dependent pragmatic interpretation?
- Does model confidence actually explain why paraphrases produce different outputs?
- Why do personas in language models resist correction through prompting alone?
- What makes persona-assigned language models unstable across different conversation runs?
- Why do language models resist adopting different personalities when prompted?
- How does the [remention] token help models distinguish initial from later mentions?
- How much do structural inductive biases matter compared to training data volume?
- Why do reward models fail when they ignore the prompt context?
- How does keyword priming enable language models to spread poisoned information?
- Why do language models prefer accommodating false information over rejecting it?
- Can presupposition projection strength vary by context in embeddings?
- Why do non-factive verbs and triggers both fool language models?
- Why do language models treat presupposition triggers as categorical patterns?
- Can the same predicate generate different projection strength in different contexts?
- Why do explicit linguistic markers override semantic computation in models?
- Can users inject entirely new knowledge into models through prompting alone?
- Does prompt performance vary by how well training data covers the domain?
- Does foundational model training or user priors more strongly shape final outputs?
- Can consistency training defend against adversarial text injection attacks?
- Why does attention quality degrade as context length increases?
- How do model priors enable targeted context queries without full attention?
- Why do models hallucinate when retrieval heads fail despite having information in context?
- Why do language models prefer certain response styles regardless of what the prompt asks?
- How does retrieval-augmented training reduce domain specialization cliff failures?
- Can models internalize retrieved context as static parametric knowledge?
- Can language models recognize when to ignore off-topic information in conversations?
- Why do automated selection methods outperform human judgments of relevant context?
- Why do pretrained model priors reduce the usefulness of retrieved experience?
- Can prompt position alone shift language model predictions by twenty percent?
- Can representation engineering cleanly isolate single features in entangled semantic space?
- Do all semantic steering effects follow predictable patterns based on feature alignment?
- Why does sentiment polarity matching matter more than relevance alone?
- Can language models keep secrets and control information strategically?
- Can models distinguish between ambiguous and incomplete information inputs?
- What substrate do supervised models lack that makes them weaker on low-resource languages?
- How should dialogue systems represent and update uncertainty from noisy ASR input?
- Why does training data not function as a searchable corpus?
- Why does hierarchical formal language training improve token efficiency more than natural language?
- Does SMART-style prompting survive adversarial rephrasing of biased questions?
- How does interleaving reasoning with action prevent hallucination in language models?
- Why do language models struggle with evaluative tasks like weighing competing viewpoints?
- Do dialogue systems need different retrieval strategies for opinions versus factual knowledge?
- How does peer presence amplify self-directed goal guarding in language models?
- Why does post-training suppress alignment faking in some models but amplify it in others?
- Why does monological training prevent models from overriding statistical priors?
- How does dialogue during training shape the ability to ignore word frequency?
- Why does removing semantic content collapse reasoning in language models?
- Why does probability of text completion not equal knowledge value?
- What design choices actually make language models more persuasive?
- Can models be trained to explain instead of imitate answers?
- Why does selective conversation history outperform including all prior context?
- Why do language models presume common ground instead of building it?
- What makes multi-session context tracking harder than single-turn underspecification problems?
- What would it mean for a language model to canvas counterpositions?
- Can language models learn internal world models without explicit environment specifications?
- Why is editing specific facts so difficult in language models?
- How do trained weights differ from a stored library or text?
- What explains the contextual variability of knowledge in transformers?
- Does attention bias explain grounding failure in language models?
- How does repeated content shift model outputs across multiple turns?
- How do induction heads learn to overwrite computational representations?
- How do static embeddings and contextualized representations divide semantic labor?
- Can knowledge encoded in model representations fail to influence generation?
- Why does context work differently in AI than in conventional software?
- What emerges in large language models that makes explicit value modeling necessary?
- What distinct structural signatures do model repetition and topic volatility create?
- Can the same description-then-retrieve pattern work for domain adaptation without target data?
- Do instruction-tuned models prefer conversational over formal source language?
- How does training distribution shape what language models understand best?
- Why do language models fail at iterative numerical optimization despite scale?
- Can language models learn to diversify their discourse-level narrative patterns over time?
- Can prompted or fine-tuned models generate genuine narrative ambiguity?
- What implicit premises do language models skip even with correct surface reasoning?
- Why do different language models converge on similar narrative defaults?
- Do distributed relational tasks consistently underperform local classification across NLP domains?
- How do pretrained language models represent inferential patterns versus lexical and positional cues?
- How tight should a textual learning rate be before it prevents skill escape?
- Can prompt-based debiasing work if biases are embedded in pretraining?
- Can data filtering during pretraining prevent cognitive biases in language models?
- Why do language models fail at understanding ambiguous or complex requirements?
- How does separating local and global context dependencies affect long-context performance?
- How do prior errors in reasoning context amplify future mistakes?
- Can goal information injected at inference time replace goal-conditioned training?
- Does input surprise drive the implicit recognition of on-policy context?
- How do prior errors in context history amplify future mistakes in long tasks?
- How do corpus statistics shape the abstraction hierarchy in language model representations?
- What makes some contexts learnable as rules versus requiring model retraining?
- How should training data be constructed to preserve teacher-student information gaps?
- How do training associations override context information in language models?
- Can external actions provide causal necessity that language models lack?
- Why do current large language models fail to entrain with users?
- Does the pretrained prior actually constrain what internalized search can discover?
- Can entropy signatures alone detect whether context was model-generated or externally prefilled?
- How do training data distributions constrain what language models can accurately know?
- Can retrieval policies learn to use pretraining statistics as decision features?
- Why do long-context language models struggle with compositional reasoning tasks?
- Why do models override signals they clearly perceive internally?
- How does representation sparsity change when inputs fall outside the training distribution?
- What happens to representational structure during model pretraining phases?
- Can we measure how much prior errors bias subsequent token predictions?
- How do models develop dense representations for familiar training data?
- What limits the capacity of context-based fast adaptation channels?
- How does transformer attention bias toward repeated and context-prominent content?
- How does shape-holding in language models naturally produce sycophantic agreement?
- Can models consolidate context into weights during idle offline phases?
- What makes human language fundamentally different from what language models produce?
- Can autoformalisation from natural language preserve semantic accuracy?
- Does including full context always degrade memory retrieval quality in practice?
- How do vector embeddings fail to capture task-relevant document relationships?
- Why do language models ignore condensed memory even when it is the only memory?
- Do few-shot examples improve in-context learning or add noise?
- Are newer larger language models actually worse at faithful summarization?
- What structural biases does transformer attention have before training?
- Do language models favor outputs from their own model family?
- What makes principle-response mutual information sufficient for behavioral alignment?
- Can language models beat human experts in domains with sparse historical signals?
- How can language models extract more value from fewer demonstrations?
- What makes a good in-context learning example for a given task?
- Why do unified models still inherit data-distribution biases from training?
- How does context engineering bridge human intent and machine understanding?
- What is the comprehension-generation asymmetry in language models?
- Why do weaker agents need more aggressive context compression than stronger ones?
- Why do embeddings measure association instead of actual task relevance?
- Can interventions on individual features reliably steer language model behavior?
- How does training order affect knowledge acquisition in language models?
- What causes overfitting when forcing new facts into model weights?
- Can text-infilling pretraining adapt language models to irregular document structures?
- What does next-token prediction tell us about compositional linguistic competence?
- Why do multimodal models fail on rare and underrepresented concepts?
- Can we unlearn memorized text by finetuning only high-gradient weights?
- Why does document perplexity stay low while question-answering accuracy drops?
- Why do language models use remaining tokens to rationalize instead of reconsider?
- Can affective framing reliably improve language model outputs?
- What makes domain-specific utterance resolution harder for general large models?
- How do semantic features in representations become steerable task-specific directions?
- How much training data teaches retrieval models to follow instructions?
- How does externalized state affect the long-context bottleneck in language models?
- Why do language models need external temporal signals at all?
- How does evaluation setting affect measured reasoning capabilities in language models?
- Should user context live in tokens or in learned model representations?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do language models actually use their encoded knowledge?
Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
the complementary failure: even information that IS correctly encoded may not causally influence output
-
Do classical knowledge definitions apply to AI systems?
Classical definitions of knowledge assume truth-correspondence and a human knower. Do these assumptions hold for LLMs and distributed neural knowledge systems, or do they need fundamental revision?
context integration failure is part of why "LLM knowledge" is not propositional knowledge
-
Do language models actually build shared understanding in conversation?
When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
the conversational consequence: context integration failure at the representational level surfaces as presumption of common ground at the communicative level — both reflect the same absence of bidirectional grounding
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- How new data permeates LLM knowledge and how to dilute it
- Language models show human-like content effects on reasoning tasks
- Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
- Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels
- Learning To Retrieve Prompts for In-Context Learning
- Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
- From Context to Skills: Can Language Models Learn from Context Skillfully?
- Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Original note title
llm context integration fails when prior training associations override current context information