Do LLMs predict entailment based on what they memorized?
Explores whether language models make entailment decisions by recognizing memorized facts about the hypothesis rather than reasoning through the logical relationship between premise and hypothesis.
McKenna et al. (2023) named a specific, reproducible bias in LLM entailment behavior: the attestation bias. When an LLM is asked whether premise P entails hypothesis H, its prediction is bound to the hypothesis's out-of-context truthfulness — whether H is attested in training data — rather than the conditional truth of H given P.
The mechanism is clear: if a model's training data confirms H as true (independently of any premise), the model is likely to predict entailment regardless of what P says. Conversely, if H is not attested, the model is less likely to predict entailment even when it would be correct. Entities serve as "indices" to memorized propositions — the presence of a known entity activates stored associations that override the in-context reasoning task.
The authors demonstrate this with a "random premise" experiment: replace the original premise with a random unrelated premise while keeping H constant. An ideal inference model should detect that entailment is no longer supported and predict "no entailment." LLMs instead maintain elevated entailment predictions when H is attested — demonstrating that they are responding to stored propositions about H, not to the P→H relationship.
This connects to two complementary failure modes already in the vault. Do language models actually use their encoded knowledge? shows that encoded knowledge doesn't reliably affect generation. Attestation bias is the inverse problem: memorized statements do influence generation, but in the wrong direction — they substitute for rather than support proper inference. Both failures arise from the same root: LLM generation is not governed by a clean separation between retrieved knowledge and in-context reasoning.
The practical implication: NLI benchmark performance measures a combination of reasoning and memorization that cannot be cleanly disentangled without carefully designed bias-adversarial test sets.
Inquiring lines that use this note as a source 52
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How does instrumental reasoning reproduce pre-Enlightenment knowledge structures?
- Can LLMs infer situational context the way humans do pragmatically?
- How does surface salience compete with background knowledge in model inference?
- How do models signal knowledge gaps through token probability?
- What replaces truth-correspondence in probabilistic knowledge representations?
- How do entailment checks prevent synthetic data from degrading retrieval corpora?
- Why do language models substitute parametric knowledge over retrieved context mid-reasoning?
- Why do language models imitate reasoning form without abstract inference capability?
- Should LLM reasoning be studied as latent state trajectories rather than surface text?
- How does implicit meaning processing limit LLM pragmatic reasoning?
- Why does hypothesis attestation bias exist separately from frequency bias in NLI?
- Can LLMs infer implicit meaning without surface linguistic markers?
- How can entailment benchmarks separate genuine reasoning from memorization effects?
- Why do entities trigger memorized propositions instead of enabling reasoning?
- How do embedding contexts like presupposition triggers affect LLM entailment reasoning?
- Do models with unfilled memorization capacity appear to generalize falsely?
- Why is extracting training data insufficient proof that models memorize?
- How do LLMs infer information that was explicitly censored?
- Do LLMs understand implicit warrants in reasoning chains?
- Why can LLMs identify argument structure but not check warrants?
- How does fine-tuning on natural language inference affect fallacy susceptibility?
- Can LLMs identify implicit metaphoric mappings that require pragmatic inference?
- Why do LLMs explain evidence accurately while missing its implications?
- How much does question framing affect LLM accuracy on knowledge tasks?
- What specific linguistic features cause LLMs to fail at trivial entailment?
- Can language models correct false assumptions or only reinforce them?
- Why does LLM compression eliminate causal grounding in conceptual representations?
- Can encoder models match human conceptual structure better than larger language models?
- Can models detect false presuppositions when they actually possess the knowledge?
- How does inductive reasoning from partial evidence enable hypothesis formation?
- Can LLMs compute how presuppositions project through embedded clauses?
- How does an instruction-following LLM activate latent retrieval knowledge?
- Can models internalize retrieved context as static parametric knowledge?
- How does bidirectional entailment distinguish semantic equivalence from token similarity?
- How does the LLM Fallacy prevent users from noticing cognitive debt accumulating?
- Why does probability of text completion not equal knowledge value?
- Do language models behave differently on contested beliefs versus factual claims?
- Why do LLMs fail at counterfactual reasoning despite factual knowledge?
- Can simple structure perturbations reliably expose memorization in reasoning models?
- Do base models contain latent reasoning that minimal training can unlock?
- What implicit premises do language models skip even with correct surface reasoning?
- How do pretrained language models represent inferential patterns versus lexical and positional cues?
- Do base models truly possess latent reasoning capability?
- What semantic information is necessary to preserve for sound LLM reasoning?
- Does the base model already contain latent reasoning capability?
- What mechanisms activate latent reasoning capabilities already present in base models?
- Why does in-weight memorization fail compared to tool-based fact access?
- Why does semantic deduplication reduce memorization in fine-tuned models?
- What makes procedural knowledge in documents generalize better than facts?
- What latent reasoning capability do base models already possess before training?
- What makes factual memorization less efficient than tool-based retrieval?
- Do LLMs show stronger reasoning about causality than about temporal ordering?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do language models actually use their encoded knowledge?
Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
the complementary failure: encoded knowledge that doesn't influence generation; attestation is memorized knowledge that influences generation in the wrong direction
-
Why do language models ignore information in their context?
Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.
same mechanism: parametric associations override in-context information
-
Does fine-tuning on NLI teach inference or amplify shortcuts?
When LLMs are fine-tuned on natural language inference datasets, do they learn genuine reasoning abilities or become better at exploiting statistical patterns in the training data? Understanding this distinction matters for assessing model capabilities.
fine-tuning makes attestation-related frequency bias worse, not better
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Explicit Inductive Inference using Large Language Models
- Neutralizing Bias in LLM Reasoning using Entailment Graphs
- Sources of Hallucination by Large Language Models on Inference Tasks
- Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
- Minds versus Machines: Rethinking Entailment Verification with Language Models
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
- Premise Order Matters in Reasoning with Large Language Models
- Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time
Original note title
llm entailment predictions are bound to hypothesis attestation rather than premise-hypothesis inference