SYNTHESIS NOTE
Model Architecture and Internals Training, RL, and Test-Time Scaling

What happens inside models when they suddenly generalize?

Grokking appears as an abrupt shift from memorization to generalization. But is the underlying process truly discontinuous, or does mechanistic analysis reveal continuous phases we can measure and predict?

Synthesis note · 2026-02-23 · sourced from MechInterp
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

Grokking — the phenomenon where models trained far beyond overfitting suddenly generalize — appears discontinuous from the outside. Mechanistic analysis reveals three continuous phases underneath:

  1. Memorization phase. The model learns to reproduce training data through lookup-table-like mechanisms. Training loss drops, test loss remains high. The memorizing circuit dominates.

  2. Circuit formation phase. A generalizing circuit gradually forms in the weights, competing with the memorizing circuit. For modular addition, this circuit uses discrete Fourier transforms and trigonometric identities to convert addition to rotation about a circle. The generalizing circuit is more efficient (uses regularization-favored structure) but initially weaker.

  3. Cleanup phase. The generalizing circuit overtakes the memorizing circuit. Memorization components are pruned away. Test loss drops. Generalization emerges.

Progress measures defined through mechanistic analysis (tracking the formation of specific algorithmic components) allow monitoring grokking as it happens, replacing the seemingly sudden shift with continuous, predictable development.

Two composition findings from the grokked transformers paper:

The difference correlates with the circuit configuration — comparison allows more systematic generalization because the comparison operation is simpler to represent compactly. The paper recommends cross-layer knowledge sharing mechanisms (memory augmentation, explicit recurrence) to further unlock transformer generalization.

Formal capacity trigger: The memorization capacity paper (2505.24832) adds a crucial quantitative dimension: GPT-family models have an approximate capacity of 3.6 bits-per-parameter for unintended memorization. Models memorize until this capacity fills, at which point grokking begins and unintended memorization decreases as generalization takes over. This means the three phases are not triggered by training duration per se, but by a measurable capacity saturation event. The paper also formally separates memorization into unintended memorization (information about a specific dataset) and generalization (information about the true data-generation process), and argues that extraction/generation is neither necessary nor sufficient proof of memorization — a model may memorize patterns without reproducing them verbatim.

This connects to How do transformers learn to reason across multiple steps? — both describe staged development of reasoning capability, but grokking requires training far beyond the typical schedule. The practical tension: standard training may terminate before the cleanup phase, leaving models in the memorization phase where they appear to have learned but haven't generalized.

Inquiring lines that use this note as a source 5

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
16 direct connections · 165 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

grokking reveals three continuous phases of learning — memorization then circuit formation then cleanup