Why do entities trigger memorized propositions instead of enabling reasoning?
This explores why naming a familiar entity in a prompt seems to make an LLM recall a stored statement about it rather than actually reasoning from the premises in front of it.
This explores why naming a familiar entity in a prompt seems to make an LLM recall a stored statement about it rather than actually reasoning from the premises in front of it. The corpus has a sharp answer for the first half: models lean on what they've seen attested before. The clearest evidence is attestation bias — LLMs judge whether a hypothesis follows from a premise based on whether that hypothesis appeared in training, not on whether the premise actually supports it. Swap in a random, irrelevant premise and the model still confidently says 'entailed' as long as the conclusion is familiar Do LLMs predict entailment based on what they memorized?. The entity (or the familiar proposition) acts as a retrieval key, short-circuiting the inferential step the prompt was meant to trigger.
This isn't a knowledge gap — it's a routing problem. The FLEX work shows models will accept a false presupposition baked into a question even when, asked directly, they demonstrably know the correct fact Why do language models accept false assumptions they know are wrong?. So the failure isn't 'the model doesn't know'; it's that a familiar framing pulls a stored answer before the verification machinery engages. The memorized response and the reasoned response are both available — the entity tips the scale toward the cheaper, pre-stored one.
Mechanistically, the corpus locates where this happens. Token-level analysis finds that 'local' memorization — predicting the next token from the immediately preceding ones — drives up to two-thirds of chain-of-thought errors, and it gets worse as problems grow more complex or drift from the training distribution Where do memorization errors arise in chain-of-thought reasoning?. A familiar entity creates exactly that local pull: the surrounding tokens look like something seen before, so the model completes the pattern instead of computing. Strikingly, this suggests the verbose reasoning trace may not be doing the inferential work we assume — models trained on deliberately corrupted, irrelevant traces solve problems just as well, implying traces often function as computational scaffolding rather than genuine step-by-step deduction Do reasoning traces need to be semantically correct?.
What flips it toward reasoning? The interventions in the corpus all work by *forcing the inferential step to become explicit* so it can't be skipped. Structured critical-question prompts make the model name its warrant and backing — the implicit premise it would otherwise glide past — and catch failures plain chain-of-thought lets through Can structured argument prompts make LLM reasoning more rigorous?. Modular 'cognitive tools' go further, isolating each reasoning operation in its own sandboxed call so the model can't blur retrieval and inference together; that isolation alone lifted GPT-4.1 on competition math from 27% to 43% with no extra training Can modular cognitive tools unlock reasoning without training?. The common thread: reasoning capability is latent and present, but a familiar entity lets the model satisfy the prompt without invoking it — unless the structure makes invoking it unavoidable.
The thing you might not have expected to learn: this is less about memorization being a bug and more about it being the default *route*. The same models that parrot a stored proposition will reason correctly when the architecture or prompt denies them the shortcut — which is also why some questions do better *without* step-by-step prompting at all, when the question's own semantics flow cleanly into the answer Why do some questions perform better without step-by-step reasoning?. The entity doesn't disable reasoning; it offers an off-ramp, and the model takes it whenever nothing forces it to stay on the road.
Sources 7 notes
McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.
The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.
STIM framework identifies local, mid-range, and long-range memorization sources in CoT reasoning. Local memorization—based on preceding tokens—accounts for up to 67% of reasoning errors, especially as complexity increases and distributional shift occurs.
Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.
Applying Toulmin's argument model as explicit prompting steps (CQoT) improves LLM reasoning by forcing models to identify warrants and backing rather than skipping implicit premises. The method catches failures that standard chain-of-thought prompting allows.
Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.
Saliency analysis reveals that CoT prompting fails when question information doesn't aggregate into the prompt structure before reasoning begins. For simple questions, direct question-to-answer flow outperforms step-by-step reasoning, showing the optimal prompt depends on question type, not just task category.