Can externalizing bookkeeping improve search agent performance?
Does moving routine state management out of the policy and into a stateful environment harness free reinforcement learning to focus on genuine semantic decisions? This explores whether division of labor between environment and model improves search efficiency.
The usual framing of a search agent is a policy over a growing transcript: the model must simultaneously decide what to search and remember what it has seen, which evidence is useful, which constraints remain open, and which claims it actually checked. Harness-1 argues this overloads reinforcement learning — it forces the policy to optimize both genuine semantic search decisions and routine bookkeeping that the environment can maintain far more reliably.
The fix is a division of labor. The harness maintains environment-side working memory: a candidate pool, an importance-tagged curated set, compact evidence links, verification records, deduplicated observations, and budget-aware context rendering. The policy keeps only the semantic decisions — what to query, what to keep or discard, what to verify, and when to stop. A 20B model trained this way reaches 0.730 average curated recall across eight benchmarks, beating the next open searcher by +11.4 points and staying competitive with much larger frontier models.
The deeper claim is that the harness is not an implementation detail but part of what the policy learns to use — gains transfer to held-out benchmarks and survive component ablation. This is the search-agent instantiation of a broader principle: capability moves out of parameters and into the editable scaffolding. Since Is long-context bottleneck really about memory or compute?, externalizing bookkeeping is exactly what frees the policy's scarce reasoning compute for decisions only it can make.
Inquiring lines that use this note as a source 9
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What role does retrieval mechanism design play in forecast accuracy?
- Why does externalizing bookkeeping raise effective feedback compute?
- Can external managers optimize context better than the model itself?
- How does external context control compare to agents managing their own state internally?
- How do search and reasoning workflows improve forecasting performance over base models?
- Can externalizing bookkeeping to a stateful harness replace internalized memory control?
- What specific bookkeeping tasks can environments maintain more reliably than policies?
- Do gains from harness-based agents transfer across different search benchmarks?
- Do information gathering and task execution require different incentive structures?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
How do model capabilities differ from harness infrastructure in agents?
What distinct layers make up an agentic system, and how do failures in each layer differ? Understanding this decomposition helps pinpoint whether problems stem from the model, the infrastructure, or the agent's own code.
provides the vocabulary: this is harness infrastructure absorbing state the model would otherwise carry
-
Where does agent reliability actually come from?
Exploring whether LLM agent performance depends on larger models or on thoughtful system design choices like memory, skills, and protocols that shift cognitive work outside the model.
same thesis, generalized; Harness-1 is the retrieval-RL proof
-
Can agents fail from weak memory control rather than missing knowledge?
As multi-turn agent workflows grow longer, performance degrades—but is this due to insufficient context or poor memory management? This explores whether memory *control* is the real bottleneck.
convergent move: replace transcript accumulation with structured environment-side state
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
- DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
- SSRL: Self-Search Reinforcement Learning
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
- VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild
Original note title
search agents should externalize recoverable bookkeeping to a stateful harness so RL only optimizes semantic decisions