SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Model Architecture and Internals

Do hierarchical retrieval architectures outperform flat ones on complex queries?

Explores whether separating query planning from answer synthesis into distinct architectural components improves performance on multi-hop retrieval tasks compared to unified single-pass approaches.

Synthesis note · 2026-02-21 · sourced from Deep Research

HierSearch separates two functions that flat retrieval architectures conflate: deciding what to search for (query planning) and deciding what the answer is (answer synthesis). The finding is that these functions interfere with each other when combined, and separating them improves multi-hop query performance.

The interference mechanism: in a flat architecture, the model must simultaneously track what it is looking for, what it has found, and how the findings combine into an answer. Multi-hop queries require multiple retrieval rounds with intermediate synthesis steps — each round's findings must inform the next round's query while also contributing to the final answer. When one model component handles all of this, it loses coherence across the chain. The hierarchical architecture assigns query planning to one component and answer synthesis to another, letting each specialize.

This has implications beyond deep research. The same interference between planning and execution is well-documented in agent design: models that plan and execute simultaneously produce worse plans and worse execution than models where these are separated. HierSearch is the retrieval-specific confirmation of a general architectural principle.

The structural finding also has a connection to How do readers track segments, purposes, and salience together? — that is the cognitive architecture problem HierSearch solves at the system level. The discourse-level problem (tracking segments + purposes + salient objects in parallel) is equivalent to the retrieval-level problem (tracking query intent + retrieved evidence + synthesis state in parallel). Architecturally separating these reduces the tracking burden.

LogicRAG extends the hierarchical principle by making the query planning step structurally explicit: it decomposes the query into a directed acyclic graph (DAG) of subproblems at inference time, then resolves them in topological order. Where HierSearch separates planning from synthesis at the system level, LogicRAG implements the planning step as a structured dependency graph at the query level. The result: query-adaptive logic structures without corpus pre-processing cost. See Can query-time graph construction replace pre-built knowledge graphs?.

Inquiring lines that use this note as a source 88

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
17 direct connections · 158 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

hierarchical research architectures that separate query planning from answer synthesis outperform flat architectures on multi-hop queries