SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Model Architecture and Internals Training, RL, and Test-Time Scaling

Does teaching question patterns before document training improve knowledge access?

Standard LLM training encodes documents first, then teaches QA patterns. But does this order matter? Exploring whether reversing the sequence—teaching how knowledge gets queried before encoding it—could unlock better factual recall.

Synthesis note · 2026-06-03 · sourced from Training Fine Tuning

To keep an assistant current, the standard recipe is continued pretraining on new documents followed by instruction-tuning on QA pairs. The paper finds this fails: LLMs trained this way struggle to answer questions even when the perplexity of the documents is minimized — the knowledge is encoded but not accessible. The diagnosis is a granularity mismatch: QA pairs are simple and direct, while documents weave many facts together intricately, so encoding document knowledge without knowing how it will be queried produces representations that don't surface under questioning.

The fix inverts the order. Pre-instruction-tuning (PIT) instruction-tunes on questions before continued pretraining on documents, so the model learns how knowledge is accessed before it encodes the knowledge — and the encoding then takes the access pattern into account. PIT outperforms standard instruction-tuning for later factual recall.

The keeper is a principle about knowledge encoding: what the model learns from a document depends on whether it already knows how that knowledge will be retrieved — encoding and access are coupled, not sequential. This connects to Can we predict keyword priming before learning happens? (how new facts get recruited) and to Does procedural knowledge drive reasoning more than factual retrieval?: both, with PIT, point to how knowledge is represented for use mattering more than raw exposure.

Inquiring lines that use this note as a source 1

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 116 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

pre-instruction-tuning on QA pairs before training on documents improves knowledge acquisition by encoding how knowledge will be accessed first