SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Model Architecture and Internals

Can learned traversal policies beat exhaustive graph reading?

As knowledge graphs grow, can agents learn which nodes to explore rather than ingesting entire subgraphs? This explores whether MCTS and reinforcement learning can solve the context-window constraint better than dumping whole graphs into the LLM.

Synthesis note · 2026-05-03
How should retrieval and reasoning integrate in RAG systems?

Naive GraphRAG dumps the relevant subgraph into the LLM's context, which works for small knowledge graphs but breaks at scale: even moderate-sized graphs blow past context limits, and most of what gets passed in is irrelevant to the query. Graph-O1 reframes graph reasoning as an agentic search problem. Instead of reading the whole graph, an agent uses Monte Carlo Tree Search to select promising nodes and edges to explore step by step, and reinforcement learning trains the policy that decides which expansions are worthwhile.

This trades one constraint for another: the LLM no longer has to ingest the whole graph but does have to make navigation decisions under uncertainty about what lies beyond each unexplored edge. MCTS is the right tool for this because it natively handles the explore-exploit problem — it can commit cheap rollouts to evaluating whether a branch is worth deeper traversal — and RL adapts the policy to the specific graph topology and query distribution rather than relying on a generic heuristic.

The general lesson extends beyond graphs. As context windows become the binding constraint for retrieval-heavy reasoning, the architectural pressure shifts from "fit more in" to "decide what not to read." Agentic traversal with learned policies is a way to do that decision making well, and the principle should generalize to any retrieval space where exhaustive exposure is infeasible. Does reasoning ability actually degrade with longer inputs? gives an even stronger reason to selectively read — even when content fits, reasoning over it degrades with irrelevant material present.

Inquiring lines that use this note as a source 30

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 99 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

MCTS plus RL replaces whole-graph reading with selective traversal in GraphRAG — context-window limits make exhaustive graph exposure infeasible at scale