SYNTHESIS NOTE

Does agent efficiency really break down into three distinct components?

Can we understand agent efficiency as three independent optimization problems—memory, tool use, and planning—each with separate cost drivers? This matters because it could explain why point optimizations keep missing the bigger picture.

Synthesis note · 2026-05-18 · sourced from Agents

The agent-efficiency literature has historically been a collection of point optimizations — a paper about better tool selection here, a paper about prompt compression there. The Toward Efficient Agents survey argues the field is better understood through a three-component decomposition that maps to where the costs actually accumulate.

Efficient Memory: techniques for compressing historical context, managing memory storage, and optimizing context retrieval. The cost driver here is context window — long agent trajectories accumulate context that grows linearly with steps, hitting token budgets and increasing latency per turn. Compression, summarization, structured episodic memory, and retrieval-on-demand all reduce this cost.

Efficient Tool Learning: strategies to minimize the number of tool calls and reduce the latency of external interactions. The cost driver is external API latency — each tool call is a round-trip to a system the agent does not control, often with seconds of latency and rate-limiting. Reducing tool calls (caching, batching, smarter selection) reduces wall-clock time more than any internal optimization can.

Efficient Planning: strategies to reduce the number of executing steps and API calls required to solve a problem. The cost driver is multi-step amplification — each step's cost compounds across all steps. Better planning means fewer steps to the same outcome, and the savings multiply across the trajectory.

The three axes are orthogonal in the sense that a technique improving one does not automatically improve the others. A memory-compression technique does not reduce tool calls; a tool-selection improvement does not affect context length; a planning improvement does not address either. Efficient agent design requires optimization on all three axes — and the costs (latency, tokens, steps) are the right axes for comparing techniques regardless of which component they target.

The methodological consequence is that benchmarks should report effectiveness under fixed cost budgets and cost at comparable effectiveness — the Pareto frontier between effectiveness and cost. Single-number rankings of agent quality miss the structure of the actual deployment trade-off.

Inquiring lines that use this note as a source 8

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 101 in 2-hop network ·medium cluster Open in graph ↗

Does agent efficiency really break down into thr… Why does agent efficiency differ from model size r… Do efficiency techniques across agent components r… Can three axes replace the short-term long-term me…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does agent efficiency really break down into three distinct components?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4