SYNTHESIS NOTE
Model Architecture and Internals Reasoning, Retrieval, and Evaluation

Where does a model store memorized paragraphs?

Can we pinpoint the specific layers, attention heads, and tokens where language models localize verbatim memorization? Understanding this spatial signature could enable targeted unlearning.

Synthesis note · 2026-06-03 · sourced from Memory

Can we localize where a model stores the verbatim paragraphs it can recite? This study (on GPT-Neo 125M / the Pile) finds that while memorization is spread across layers and components, it has a distinguishable spatial signature: gradients of memorized paragraphs are larger in lower layers than for non-memorized examples — so memorized examples can be unlearned by finetuning only the high-gradient weights. A specific low-layer attention head is especially involved, and it predominantly attends to distinctive, rare tokens (least frequent in the corpus unigram distribution). Token-perturbation analysis shows memorization is concentrated in a few distinctive early-prefix tokens — corrupting them often corrupts the entire continuation. And memorized continuations are harder to unlearn and to corrupt than non-memorized ones.

The keeper is the localization signature: memorization, though distributed, leaves a low-layer / rare-token / early-prefix fingerprint that makes it targetable for unlearning — and rare tokens are the hook the model hangs verbatim recall on.

This deepens the vault's memorization thread mechanistically. It complements the capacity account in When do language models stop memorizing and start generalizing? and the fine-tuning-leakage measurement in Does repeated sensitive data in fine-tuning cause memorization? by saying where the memorized content lives and how to target it.

Inquiring lines that use this note as a source 13

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 135 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

paragraph memorization localizes to low-layer gradients and a rare-token attention head and a few prefix tokens can corrupt the whole continuation