Why do entities trigger memorized propositions instead of enabling reasoning?

This explores why naming a familiar entity in a prompt seems to make an LLM recall a stored statement about it rather than actually reasoning from the premises in front of it.

This explores why naming a familiar entity in a prompt seems to make an LLM recall a stored statement about it rather than actually reasoning from the premises in front of it. The corpus has a sharp answer for the first half: models lean on what they've seen attested before. The clearest evidence is attestation bias — LLMs judge whether a hypothesis follows from a premise based on whether that hypothesis appeared in training, not on whether the premise actually supports it. Swap in a random, irrelevant premise and the model still confidently says 'entailed' as long as the conclusion is familiar Do LLMs predict entailment based on what they memorized?. The entity (or the familiar proposition) acts as a retrieval key, short-circuiting the inferential step the prompt was meant to trigger.

This isn't a knowledge gap — it's a routing problem. The FLEX work shows models will accept a false presupposition baked into a question even when, asked directly, they demonstrably know the correct fact Why do language models accept false assumptions they know are wrong?. So the failure isn't 'the model doesn't know'; it's that a familiar framing pulls a stored answer before the verification machinery engages. The memorized response and the reasoned response are both available — the entity tips the scale toward the cheaper, pre-stored one.

Mechanistically, the corpus locates where this happens. Token-level analysis finds that 'local' memorization — predicting the next token from the immediately preceding ones — drives up to two-thirds of chain-of-thought errors, and it gets worse as problems grow more complex or drift from the training distribution Where do memorization errors arise in chain-of-thought reasoning?. A familiar entity creates exactly that local pull: the surrounding tokens look like something seen before, so the model completes the pattern instead of computing. Strikingly, this suggests the verbose reasoning trace may not be doing the inferential work we assume — models trained on deliberately corrupted, irrelevant traces solve problems just as well, implying traces often function as computational scaffolding rather than genuine step-by-step deduction Do reasoning traces need to be semantically correct?.

What flips it toward reasoning? The interventions in the corpus all work by *forcing the inferential step to become explicit* so it can't be skipped. Structured critical-question prompts make the model name its warrant and backing — the implicit premise it would otherwise glide past — and catch failures plain chain-of-thought lets through Can structured argument prompts make LLM reasoning more rigorous?. Modular 'cognitive tools' go further, isolating each reasoning operation in its own sandboxed call so the model can't blur retrieval and inference together; that isolation alone lifted GPT-4.1 on competition math from 27% to 43% with no extra training Can modular cognitive tools unlock reasoning without training?. The common thread: reasoning capability is latent and present, but a familiar entity lets the model satisfy the prompt without invoking it — unless the structure makes invoking it unavoidable.

The thing you might not have expected to learn: this is less about memorization being a bug and more about it being the default *route*. The same models that parrot a stored proposition will reason correctly when the architecture or prompt denies them the shortcut — which is also why some questions do better *without* step-by-step prompting at all, when the question's own semantics flow cleanly into the answer Why do some questions perform better without step-by-step reasoning?. The entity doesn't disable reasoning; it offers an off-ramp, and the model takes it whenever nothing forces it to stay on the road.

Sources 7 notes

Do LLMs predict entailment based on what they memorized?

McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Where do memorization errors arise in chain-of-thought reasoning?

STIM framework identifies local, mid-range, and long-range memorization sources in CoT reasoning. Local memorization—based on preceding tokens—accounts for up to 67% of reasoning errors, especially as complexity increases and distributional shift occurs.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

Can structured argument prompts make LLM reasoning more rigorous?

Applying Toulmin's argument model as explicit prompting steps (CQoT) improves LLM reasoning by forcing models to identify warrants and backing rather than skipping implicit premises. The method catches failures that standard chain-of-thought prompting allows.

Can modular cognitive tools unlock reasoning without training?

Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.

Why do some questions perform better without step-by-step reasoning?

Saliency analysis reveals that CoT prompting fails when question information doesn't aggregate into the prompt structure before reasoning begins. For simple questions, direct question-to-answer flow outperforms step-by-step reasoning, showing the optimal prompt depends on question type, not just task category.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst tasked with re-evaluating a specific tension in LLM reasoning: *Why do entities trigger memorized propositions instead of enabling reasoning?* Treat the findings below as dated snapshots (Aug 2023–Aug 2025), not current ground truth. Your job is to stress-test them against what has shifted.

What a curated library found — and when (dated claims, not current truth):
Findings span Aug 2023–Aug 2025. A curated library identified:
• Attestation bias: models judge entailment by whether a conclusion appeared in training, not whether premises support it (2024).
• Familiar entities act as retrieval keys, short-circuiting inference; the reasoned and memorized responses coexist, but entity framing pulls toward the cheaper stored answer (2024).
• Token-level local memorization drives ~67% of chain-of-thought errors, worsening on out-of-distribution or complex problems (2025).
• Structured interventions (critical-question prompts, modular cognitive tools) force explicit inference steps; cognitive tools lifted GPT-4.1 math from 27% to 43% with zero extra training (2024–2025).
• Models trained on deliberately corrupted reasoning traces solve problems equally well, suggesting traces scaffold rather than deduce (2025).

Anchor papers (verify; mind their dates):
• arXiv:2412.15177 (2024-12): Critical-Questions-of-Thought — argumentation-based prompting to surface warrants.
• arXiv:2506.12115 (2025-06): Eliciting Reasoning in Language Models with Cognitive Tools — modular tool isolation lifting performance.
• arXiv:2508.02037 (2025-08): Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time — token-level analysis of memorization sources.
• arXiv:2505.20296 (2025-05): Reasoning LLMs are Wandering Solution Explorers — characterizing reasoning as search, not deduction.

Your task:
(1) RE-TEST EACH CONSTRAINT. For attestation bias, local memorization dominance, and the effectiveness of structured/modular interventions: has newer model scaling, training methods (e.g., process rewards, latent-space reasoning from the 2025-02 papers), or test-time compute orchestration since relaxed or overturned these findings? Separate the durable question (does entity-naming still bias toward retrieval?) from perishable limitations (do newer models still fail at 67% local-memorization rates?). Cite what resolved each constraint or where it still holds.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Have recent papers (e.g., latent-reasoning, test-time scaling, or minority-token studies from May–Aug 2025) reframed the entity-triggering problem as solved, redefined, or pushed into a different bottleneck?
(3) Propose 2 research questions that ASSUME the reasoning regime may have moved: e.g., do test-time scaling and latent-space reasoning fundamentally change how entities anchor inference? Do minority-token interventions obviate the need for structured prompting?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Why do entities trigger memorized propositions instead of enabling reasoning?

Sources 7 notes

Next inquiring lines