Can knowledge graphs generate training data for search agents?
Exploring whether synthesizing questions from knowledge graph random walks with entity blurring can create the hard-to-find training data needed to teach deep search agents to reason and search effectively.
Deep search agents need training data featuring hard-to-find questions that require long-horizon reasoning and iterative search — but such data is naturally scarce on the internet. DeepDive addresses this by automatically synthesizing challenging questions from open knowledge graphs (KGs), exploiting three properties:
- Verifiability: KG entity-relation triples are inherently traceable and objective, ensuring answer correctness — unlike fully model-generated QA pairs
- Multi-hop structure: Random walks of varying lengths on the KG explicitly control reasoning depth, generating questions requiring multiple inference steps
- Reasoning controllability: Each entity node has multiple attributes (dates, names, locations) that can be selectively obscured, creating "blurry entities" that prevent shortcut solutions
The pipeline: perform random walks on the KG to extract long multi-hop paths → LLMs further obfuscate key cues → resulting QA pairs require models to iteratively reason, search, validate, and reflect before arriving at accurate answers. This creates questions that even domain experts would need hours to research.
Combined with end-to-end multi-turn RL, DeepDive-32B achieves 14.8% accuracy on BrowseComp (a hard-to-find information benchmark), setting a new open-source competitive result and outperforming larger agents and several strong proprietary baselines. Key findings: complex supervision and multi-turn RL jointly ground tool use; performance scales with tool-call budgets and parallel sampling; skills learned on hard problems transfer to simpler settings.
The broader principle: KGs are ideal substrates for training data synthesis because they encode the relational complexity that makes questions genuinely hard, while providing the ground truth that makes answers verifiable. This is a concrete realization of the curriculum data thesis.
This connects to:
- Does search budget scale like reasoning tokens for answer quality? — DeepDive provides the training methodology to develop agents that can exploit this scaling law
- What makes deep research fundamentally different from RAG? — KG-synthesized questions naturally require all three components
- Can models improve themselves on tasks without verifiable answers? — KG-synthesized data as domain-specific reasoning catalyst; both demonstrate that training data quality structure matters more than quantity
DeepDive (2025) adds end-to-end multi-turn RL on top of KG-synthesized data. Using multi-turn GRPO where the LLM interacts with a web environment and receives rewards based on the final answer, DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. The key finding: multi-turn RL training improves deep search ability and enables test-time scaling of tool calls — the model learns to invoke search more effectively and more frequently as it reasons. This validates the KG-based data synthesis approach by showing it provides sufficient training signal for RL-based deep search agents. Source: Arxiv/Agentic Research.
Inquiring lines that use this note as a source 17
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can fixed heuristics like PageRank match learned traversal policies on graphs?
- Can knowledge graphs generate scalable training data for deep search agents?
- How do real search queries reveal what counts as a deep research question?
- Can step-level rewards improve training of agentic retrieval systems?
- Can query-time logic graphs match the efficiency of pre-built knowledge graph indexing?
- How does random walk length control reasoning complexity in question generation?
- When does simulated search outperform real search for agent training?
- How much does inference budget improve self-generated search performance?
- Can graph-based retrieval with knowledge graphs scale to multi-hop reasoning?
- How can knowledge graphs improve over pure embedding retrieval?
- Can knowledge graph structure alone generate sufficient training signals for domain reasoning?
- How do random walk reasoning chains from knowledge graphs compare to traditional fine-tuning?
- Can knowledge graph structure be exploited for efficient multi-hop retrieval?
- Can tree search improve question generation the way it improves reasoning?
- How do knowledge graphs scale as training data for open-ended search tasks?
- Why do deep research agents outperform retrieval augmented generation systems?
- Can knowledge graphs built at inference time outperform pre-built retrieval augmented generation?
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
- DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
- QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks
- Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
- DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Original note title
Knowledge graph random walks with entity blurring generate scalable hard-to-find training data for deep search agents