Do models with unfilled memorization capacity appear to generalize falsely?

This explores a sharp claim from memorization-capacity research — that until a model's fixed memorization budget fills up, what looks like generalization is really memorized pattern-matching wearing a reasoning costume.

This explores whether a model that still has room to memorize will fake generalization rather than actually do it — and the corpus has a surprisingly crisp answer to a question you might not have known had one. The key finding is that memorization isn't unlimited: GPT-family models hold roughly 3.6 bits per parameter, and that capacity is a property of the model, not the training recipe When do language models stop memorizing and start generalizing?. Only when that budget fills does a phase transition — "grokking" — flip the model from storing examples to genuinely generalizing. The implication runs the other direction of your question: a model with unfilled capacity hasn't started generalizing yet, so when it looks like it's reasoning, it's often still leaning on stored answers.

What does that false generalization look like in practice? The clearest illustration is attestation bias: ask a model whether a premise entails a hypothesis, and it answers based on whether the hypothesis simply appeared in its training data — not on whether the premise actually supports it. Feed it a random, irrelevant premise and it still says "entails" as long as the hypothesis is familiar Do LLMs predict entailment based on what they memorized?. That is generalization theater: the logical-inference behavior is real on the surface and hollow underneath. The same shape shows up when strong training priors simply override what's written in the prompt — the model ignores its own context because memorized associations dominate Why do language models ignore information in their context?.

The corpus also localizes where this leakage happens inside a chain of reasoning. The STIM analysis finds that token-level memorization has three sources, and "local" memorization — predicting the next token from the immediately preceding ones — accounts for up to 67% of reasoning errors, growing worse as problems get harder and drift from the training distribution Where do memorization errors arise in chain-of-thought reasoning?. So even mid-reasoning, the model is quietly substituting recall for inference exactly where the task is hardest.

The flip side sharpens the picture: real generalization has a different signature. Tracing five million pretraining documents shows that reasoning draws on broad, transferable *procedural* knowledge spread across many sources, whereas factual recall depends on narrow, document-specific memorization of the exact target fact Does procedural knowledge drive reasoning more than factual retrieval?. Genuine reasoning is diffuse; faked reasoning is a lookup. And there's an optimistic coda — the capacity that hasn't been "filled" may instead be latent and merely dormant: a single training example in RLVR can jump math accuracy from 36% to 73.6% and keep improving long after training accuracy saturates Can a single training example unlock mathematical reasoning?. So unfilled capacity isn't only a liability that breeds false generalization — sometimes it's unexpressed ability waiting for the right activation signal.

The thing worth taking away: "does it generalize?" and "does it look like it generalizes?" are genuinely different questions, and the corpus gives you concrete fingerprints — attestation bias, prior-over-context override, local token memorization — to tell counterfeit reasoning from the real thing.

Sources 6 notes

When do language models stop memorizing and start generalizing?

GPT-family models have a measurable memorization capacity of approximately 3.6 bits-per-parameter. When this capacity fills, a phase transition triggers grokking—the shift from memorization to genuine generalization. This capacity is a property of individual models, not training algorithms.

Do LLMs predict entailment based on what they memorized?

McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Where do memorization errors arise in chain-of-thought reasoning?

STIM framework identifies local, mid-range, and long-range memorization sources in CoT reasoning. Local memorization—based on preceding tokens—accounts for up to 67% of reasoning errors, especially as complexity increases and distributional shift occurs.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Can a single training example unlock mathematical reasoning?

A single example in RLVR boosts math performance from 36% to 73.6% and enables test accuracy to improve for 1,400 steps after training accuracy reaches 100%, revealing that minimal activation signals unlock latent reasoning capability.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher auditing whether models with unfilled memorization capacity exhibit false generalization—a question posed to a curated library spanning 2016–2026. Treat the findings below as dated claims, not current truth.

What a curated library found — and when (findings span 2016–2026; these are perishable):
• Memorization capacity is fixed (~3.6 bits per parameter) and distinct from generalization; only when this budget saturates does "grokking" phase-transition occur, flipping models from storing examples to genuine reasoning (2024–2025).
• False generalization manifests as attestation bias (models answer entailment queries based on hypothesis familiarity, not premise support) and prior-over-context override (memorized associations dominate prompt context) (2024).
• Token-level local memorization accounts for ~67% of reasoning errors in chain-of-thought, worsening as problems drift from training distribution (2025).
• Genuine reasoning draws on broad, procedural knowledge across many pretraining sources; faked reasoning is narrow, document-specific memorization (2024).
• Unfilled capacity may be dormant ability: a single RLVR training example lifts math accuracy from 36% to 73.6% (2025).

Anchor papers (verify; mind their dates):
• arXiv:2411.12580 (2024): Procedural Knowledge in Pretraining Drives Reasoning.
• arXiv:2508.02037 (2025): Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time.
• arXiv:2504.20571 (2025): Reinforcement Learning for Reasoning with One Training Example.
• arXiv:2603.03415 (2026): Farther the Shift, Sparser the Representation (OOD mechanisms).

Your task:
(1) RE-TEST EACH CONSTRAINT. For the ~3.6 bits/param capacity limit, attestation bias, and 67% local-memorization error rate: has scaling, new architectures (MoE, retrieval-augmented systems), or refined training (consistency tuning, online adaptation) since relaxed or overturned these? Separate the durable question—"when does faked vs. real reasoning occur?"—from perishable limitations (e.g., "current base models saturate at N examples"). Cite what shifted each.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Does compositional generalization at scale (2507.07207) undermine the saturation hypothesis? Does positional bias in demos (2507.22887) reframe attestation bias as a prompt-engineering artifact rather than intrinsic false generalization?
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., "If unfilled capacity now activates via in-context learning alone (no RLVR), does the false-generalization signature vanish?" or "Do mixture-of-experts route unfilled capacity away from surface pattern-matching?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Do models with unfilled memorization capacity appear to generalize falsely?

Sources 6 notes

Next inquiring lines