SYNTHESIS NOTE
Training, RL, and Test-Time Scaling Agentic Systems and Tool Use Model Architecture and Internals

Can AI systems improve themselves through trial and error?

Explores whether replacing formal proof requirements with empirical benchmark testing enables AI systems to successfully modify and improve their own code iteratively, and what mechanisms prevent compounding failures.

Synthesis note · 2026-02-23 · sourced from Novel Architectures

The original Gödel Machine proposed self-improving AI via provably beneficial self-modifications. In practice, formally proving the impact of most self-modifications is impossible. The Darwin Gödel Machine (DGM) replaces formal proofs with empirical validation: try modifications, test them on benchmarks, keep what works. This mirrors biological evolution — mutations are not verified in advance but produced, trialed, and selected.

DGM alternates between self-modification and evaluation phases. During self-modification, agents from the archive generate modified versions of themselves — rewriting their own code. During evaluation, each modified agent is tested on coding benchmarks. The key assumption: improvement on coding benchmarks indicates better coding capabilities, which in turn indicates better ability to self-modify. This creates a meta-competence loop: better coding → better self-modification → better coding.

Results: SWE-bench from 20.0% to 50.0%, Polyglot from 14.2% to 30.7%.

The evolutionary archive is critical. Inspired by open-endedness research, DGM maintains a growing library of all generated agent variants — including suboptimal but interesting ones. These serve as stepping stones for future generations, enabling diverse exploration paths. The system doesn't just optimize for immediate performance; it accumulates diverse capabilities that may enable future breakthroughs. This is fundamentally different from single-trajectory self-improvement.

Concrete improvements discovered include better code editing tools, long-context window management, and peer-review mechanisms — capabilities the original agent lacked that emerged through the self-improvement process.

The Python-based implementation makes the self-modification space Turing-complete in principle. The current version modifies agent design (tools, workflows) with frozen foundation models. Full self-improvement — rewriting training scripts, training new foundation models — is left as future work.

This directly addresses What limits how much models can improve themselves?: DGM circumvents the formal proof requirement by using empirical validation, but inherits a different limitation — improvement is bounded by what the benchmark can measure. The archive approach partially addresses How quickly do errors compound during model self-training? by maintaining diverse populations rather than following single improvement trajectories.

Inquiring lines that use this note as a source 81

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 6

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
16 direct connections · 158 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

darwin godel machine achieves open-ended self-improvement by replacing formal proofs with empirical validation and evolutionary archives