SYNTHESIS NOTE
Agentic Systems and Tool Use

Can better tools fix LLM document editing errors?

Does giving LLMs agentic tool access—like diffing, re-reading, or structured editors—improve their reliability on long-horizon document workflows? Understanding whether the problem is tool limitations or decision-making quality matters for reliability engineering.

Synthesis note · 2026-05-18 · sourced from Flaws

A natural intuition for fixing LLM document corruption: give the model better tools. Let it diff its own output, re-read the file, call a structured editor instead of regenerating prose. The DELEGATE-52 evaluation tests this directly and finds that agentic tool access does not improve performance on the benchmark.

The finding rules out a class of proposed fixes. Tool wrappers, ReAct loops, and structured editing affordances are not addressing the failure mechanism — they are downstream of it. The degradation comes from the model's own decisions about what to change and how, not from limitations of the editing interface. A model that decides to flip a numeric value will flip it through any tool you give it.

This also disambiguates two senses of "agent." The first sense — LLM-plus-tools, where capability is gated on tool affordances — predicts that tool access should improve document workflows. The second sense — LLM-as-decider, where the model's judgment about what to edit is the bottleneck — predicts that tool access should be roughly orthogonal. The DELEGATE-52 result favors the second.

The implication for workflow design: reliability gains on long delegated work probably come from changing what the model decides (better prompting, verification loops, decomposition into smaller reversible steps) rather than from upgrading what it can act through. Tool engineering helps when capability is interface-limited; it does not help when capability is judgment-limited.

Inquiring lines that use this note as a source 17

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 105 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

agentic tool use does not improve llm document-editing reliability — tools are not a fix for long-horizon delegation drift