How can agent self-evolution be made safe and auditable?
As agents begin updating their own prompts and tools, how can we track these changes, measure their effects, and safely reverse problematic updates? This matters because untracked evolution leads to unmaintainable systems and makes regressions impossible to diagnose.
Self-evolving agents that adjust strategies, refine instructions, and update tools from feedback are emerging as a path to robust autonomy. But implementations are fragmented and ad hoc: without shared standards, evolution is neither composable nor auditable, and developers fall back on brittle glue code producing monolithic, unmaintainable architectures. Existing agent protocols (A2A, MCP) under-specify cross-entity lifecycle, version tracking, and evolution-safe update interfaces.
The Autogenesis Protocol (AGP) imposes a two-layer separation that decouples what evolves from how evolution occurs. The Resource Substrate Protocol Layer models prompts, agents, tools, environments, and memory as protocol-registered resources with explicit state, lifecycle, and versioned interfaces. The Self-Evolution Protocol Layer specifies a closed-loop operator interface for proposing, assessing, and committing improvements — with auditable lineage and rollback.
The conceptual contribution is treating evolution as a governed process rather than an emergent side effect of agents editing themselves. Versioning, lineage, and rollback are the safety primitives: you can attribute a regression to a specific committed change and revert it. This is the infrastructure layer beneath capability findings like Do stronger models always evolve their own harnesses better? — that result assumes updates can be committed and measured at all, which is exactly what AGP standardizes. It also extends Should coordination protocols wrap existing systems or replace them?: AGP layers over A2A/MCP rather than replacing them.
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Should coordination protocols wrap existing systems or replace them?
Explores whether new agent coordination standards should integrate with existing protocols through bridging, or establish themselves as replacements. This shapes which standards survive and how quickly ecosystems can adopt them.
AGP is the self-evolution layer that wraps the existing A2A/MCP substrate
-
What makes agent-created code artifacts so hard to manage?
Agent-authored code that persists and is shared across systems raises difficult questions about what should be kept versus discarded, and how to maintain consistent state when multiple agents collaborate on the same artifacts.
versioned resources are the disciplined form of the persistent artifacts that note flags as understudied
-
Do stronger models always evolve their own harnesses better?
When AI agents self-improve their prompts and tools, does raw model power help equally at writing updates versus using them? Understanding this split could reshape how we design self-evolving systems.
AGP provides the commit/rollback substrate that makes "harness updates" a measurable, reversible operation
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Autogenesis: A Self-Evolving Agent Protocol
- A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
- A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence
- Large Language Model Agents Are Not Always Faithful Self-Evolvers
- Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
- Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
- Hyperagents
- SkillOS: Learning Skill Curation for Self-Evolving Agents
Original note title
safe agent self-evolution requires treating prompts tools and memory as versioned first-class resources with auditable lineage and rollback