SYNTHESIS NOTE
Agentic Systems and Tool Use

Why does random tool sampling produce unrealistic synthetic training data?

Tool-calling datasets generated through random sampling and single-turn framing lack the complexity and coherence of real deployment. This explores what structural choices in data synthesis determine whether models can learn realistic tool composition.

Synthesis note · 2026-05-03 · sourced from Action Models

The standard pipeline for generating tool-calling training data — sample tools, formulate a requirement, generate the call statement — has two defects that together cap the realism of the resulting data. First, randomly sampled tools frequently fail to interconnect, which means the synthesized requirements default to simplistic single-tool tasks because there is no plausible composition path across the random set. This collapses both diversity and complexity in the resulting dataset.

Second, the dominant framing treats tool calls as single-turn Q&A rather than dialogue. Real users interact through multi-turn conversation, so models trained on Q&A-shaped data carry a gap to deployment that surfaces as unnaturalness across turns.

ToolFlow's response is two-part. Graph-Based Sampling selects tools that are actually relevant to each other — so a synthesized requirement can credibly combine them, restoring the complexity ceiling that random sampling caps. Planned-Generation creates a plan that guides the dialogue across turns, so coherence between turns becomes a property of the generation rather than an accident.

The implication for anyone synthesizing agent training data: the choice of how tools are sampled is not a hyperparameter but a structural determinant of how complex the synthesized tasks can be. And single-turn framing is not just simpler — it is a different distribution from real deployment, which is multi-turn and coherent across turns.

This is the data-side counterpart to Where do traditional function calling systems actually break down?'s deployment-side critique: random sampling at synthesis produces simplistic tasks, which (combined with single-turn framing) yields models that fail to compose calls across turns. ToolFlow's graph-sampling move parallels Can synthetic dialogues become realistic through layered diversity? — multiplicative structured sampling beats single-axis random sampling for dialogue synthesis generally.

Inquiring lines that use this note as a source 17

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 126 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

tool-calling data synthesis fails through random tool sampling and single-turn framing — graph-based sampling and planned dialogue restore realism