SYNTHESIS NOTE
Agentic Systems and Tool Use

Can agents learn reusable sub-task routines from past experience?

Do web agents fail at long-horizon tasks because they cannot extract and reuse workflows shared across similar problems? This explores whether sub-task abstraction enables skill accumulation rather than task-by-task problem solving.

Synthesis note · 2026-05-03 · sourced from Action Models

Agent Workflow Memory (AWM) takes the human heuristic of abstracting routines from past experience and operationalizes it for web agents. The diagnostic claim is that current agents fail at long-horizon tasks not because they lack reasoning but because they cannot extract and reuse sub-task workflows shared across similar tasks — they solve each task in isolation and never accumulate transferable skill structure.

AWM's intervention has two design choices that matter. First, granularity is below the task level: rather than memorizing "Buy dry cat food on Amazon and deliver to my address," the system induces "search for a product on Amazon" — a sub-task that re-appears across many top-level tasks. Second, example-specific contexts are abstracted out — "dry cat food" becomes "{product-name}" — so the workflow is reusable rather than overfit to its source trace.

The compounding effect is the key behavior. Once "find a place by its name" exists, it serves as a building block for "get the zip code of a place." Skill memory therefore grows hierarchically: complex workflows are constructed on top of previously acquired ones. Empirically this produces 24.6% relative gain on Mind2Web and 51.1% on WebArena, with a 22.5-point gap on WebArena after only tens of examples. Critically, online AWM's advantage widens as the train-test gap grows — from 8.9 to 14.0 absolute points — because workflow abstractions transfer where memorized trajectories do not.

The implication is that the right unit of agent memory is the sub-task routine with abstracted variables, not the full task trajectory and not generic helpful hints. The unit should be small enough to recur, abstracted enough to transfer, and structured enough to compose — a position that contrasts directly with Does state-indexed memory outperform high-level workflow memory for web agents?, where PRAXIS argues the opposite: that state-indexed local procedures outperform abstracted workflows precisely because abstraction loses the click-by-click specifics web environments demand.


MUSE-Autoskill operationalizes the same compounding principle but adds the two pieces AWM leaves implicit: per-skill memory and cross-agent transfer. Where AWM induces workflow routines for one agent, MUSE attaches a dedicated memory to each skill that accumulates experience across tasks, so a routine does not merely get reused — it gets better with reuse, adapting from runtime feedback. And MUSE shows the resulting skills transfer to other agents with minimal accuracy loss, extending AWM's single-agent compounding into a shareable repository. This makes AWM and MUSE complementary on the same axis as the existing SkillClaw connection (cross-user propagation): AWM = workflow extraction within an agent; MUSE = experience-bearing skills transferable across agents.

Inquiring lines that use this note as a source 58

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 6

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 92 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

agent workflow memory induces reusable sub-task routines and compounds them — yielding 24-51 percent relative success gains and snowballing skill complexity