SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Model Architecture and Internals

What makes deep research fundamentally different from RAG?

Explores whether current systems using the label 'deep research' actually meet a rigorous three-component definition involving multi-step gathering, cross-source synthesis, and iterative refinement, or if they're performing something narrower.

Synthesis note · 2026-02-21 · sourced from Deep Research

"Deep research" is used loosely to describe anything from a single web search to a multi-hour autonomous investigation. The Characterizing Deep Research paper proposes a formal three-component definition that makes the boundary precise:

  1. Multi-step information gathering — not one retrieval round but a sequence of them, where each round can expand or contract the search space
  2. Cross-source synthesis — combining findings from multiple independent sources, not just summarizing one document
  3. Iterative query refinement — using partial findings to improve subsequent queries, not issuing all queries upfront

The definition excludes single-step RAG (fails component 1), document summarization (fails component 3), and simple web browsing (may fail component 2). It includes only systems that loop across all three simultaneously.

The practical value of the definition is benchmarking clarity. Without it, systems that perform single-step retrieval with sophisticated synthesis can claim "deep research" capability when they lack the iterative refinement component that actually distinguishes DR from RAG++. PRELUDE (the benchmark that accompanies the paper) evaluates all three components, making it possible to locate exactly where a system falls short.

This also clarifies what the TTS law applies to: Does search budget scale like reasoning tokens for answer quality? is a scaling law specifically for systems that meet the full three-component definition. Partial systems that skip iterative query refinement likely show different scaling behavior.

Researchy Questions (2024) operationalizes the "unknown unknowns" concept for deep research. Unlike standard QA benchmarks that study "known unknowns" with clear indications of what information is missing, Researchy Questions identifies non-factoid, multi-perspective, decompositional questions from real search engine logs — questions where the questioner doesn't know what they don't know. Users spend significantly more effort (clicks, session length) on these queries, and "slow thinking" techniques like decomposition into sub-questions show benefit over direct answering. An 8-dimension quality rubric (ambiguity, incompleteness, assumptions, multi-facetedness, knowledge-intensity, subjectivity, reasoning-intensity, harmfulness) provides granular characterization. This distinguishes "deep" questions from merely "hard" ones: a deep question has multiple perspectives allowing a dense manifold of answers, no single correct answer, and requires genuine synthesis rather than just retrieval. Source: Arxiv/Agentic Research.

Inquiring lines that use this note as a source 2

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 115 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

deep research requires a formal three-component definition: multi-step information gathering, cross-source synthesis, and iterative query refinement