SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Model Architecture and Internals

Does procedural knowledge drive reasoning more than factual retrieval?

Explores whether models learn reasoning through general procedures across diverse documents rather than memorizing specific facts. This matters for understanding what pretraining data actually teaches models to reason.

Synthesis note · 2026-02-22 · sourced from Training Fine Tuning
What kind of thing is an LLM really? How do you build domain expertise into general AI models? How should researchers navigate LLM reasoning research?

The "Procedural Knowledge in Pretraining Drives Reasoning" paper analyzes which pretraining documents most influence LLM reasoning by ranking 5 million documents by their influence on model completions. The finding: the approach to reasoning that models use is unlike retrieval. For reasoning tasks, positively influential documents contain procedural knowledge — descriptions of how to get to a solution — rather than the specific facts needed for the answer.

Three contrasts with factual recall:

  1. Generality: models rely on a broader, more general set of documents when reasoning than when answering factual questions. Factual recall draws on a narrow set of documents containing the target fact. Reasoning draws on a diffuse set of documents performing similar procedures.

  2. Transferability: documents have similar influence on reasoning queries that require applying the same procedure to different numbers. The procedural knowledge transfers across specific instances — it's the method, not the content, that the model has learned.

  3. Reliance distribution: the model needs to see factual information more often (across more documents) to memorize it, while procedural patterns can be learned from fewer but more diverse demonstrations.

This connects to the knowledge/reasoning layer separation. Since Why does reasoning training help math but hurt medical tasks?, the procedural knowledge finding provides the data-level explanation for the architectural finding: lower layers store memorized facts (requiring document-specific exposure), while higher layers encode procedural strategies (learnable from general demonstrations).

The implication for training data curation: reasoning capability benefits more from diverse demonstrations of procedures than from exhaustive factual coverage. Quality and diversity of reasoning demonstrations may matter more than volume for building reasoning capability — consistent with Can models improve themselves on tasks without verifiable answers?.

Inquiring lines that use this note as a source 154

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
16 direct connections · 162 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

procedural knowledge in pretraining documents drives reasoning generalization unlike factual retrieval which requires document-specific memorization