INQUIRING LINE

Why do evolutionary algorithms collapse to single solutions under selection pressure?

This explores why selection pressure pushes a population of candidate solutions to converge on one answer — losing the variety that made the search powerful — and what mechanisms in the corpus prevent that collapse.


This explores why selection pressure pushes a population of candidate solutions to converge on one answer — and the corpus frames it not as a quirk of evolutionary algorithms but as the same diversity-collapse pressure that shows up almost everywhere optimization gets sharp. The short version: any process that repeatedly rewards 'the best so far' concentrates probability mass on a single peak. Once early winners dominate the pool, their offspring crowd out exploratory variants, and the population loses the spread it needs to find better peaks elsewhere — premature convergence. The fix is almost always some structural force that resists pure selection. Can evolutionary search beat sampling and revision at inference time? makes this concrete: Mind Evolution uses an island model precisely to keep subpopulations from homogenizing, and that sustained diversity is what lets it beat single-trajectory methods that refine one answer to death.

The most striking reframe is that this is a property of selection itself, not of genetic algorithms specifically. Can diffusion models perform evolutionary search in parameter space? argues denoising in diffusion models is mathematically the same operation — selection, mutation, reproductive isolation — and that mainstream evolutionary methods collapse to single solutions exactly where diffusion preserves multimodality. So the question 'why do they collapse?' has a flip side: collapse isn't inevitable, it's what happens when nothing in the algorithm actively protects the multiple modes.

The corpus shows the same collapse under a different name in reinforcement learning, which is illuminating because RL isn't usually thought of as evolution. Does outcome-based RL diversity loss spread across unsolved problems? describes outcome-only rewards 'sharpening the policy globally' — concentrating mass on correct trajectories — which is collapse to a single solution by another route. Does reinforcement learning squeeze exploration diversity in search agents? calls the mechanism entropy collapse and notes policies converge on narrow reward-maximizing strategies, with supervised training on diverse demonstrations acting as the counterweight. The common thread across both selection paradigms: a scalar 'who won' signal is a homogenizing force.

That points to the deeper answer the corpus offers — collapse comes from compressing everything into one ranking. Can reward vectors be the hidden source of solution diversity? shows that when you keep rewards as a vector (per criterion, per test-case, per persona) instead of scalarizing them, solutions naturally specialize across a Pareto frontier and diversity survives because there's no single axis to collapse onto. Does preference tuning always reduce diversity the same way? sharpens the intuition further: selection only collapses diversity when the domain rewards convergence (code toward a correct answer) — in domains that reward distinctiveness (creative writing) the same tuning increases diversity. Collapse, in other words, is selection pressure plus a single right answer.

If you want to follow this thread somewhere unexpected, Can models reliably improve themselves without external feedback? ties diversity collapse to a fundamental limit: systems that select on their own outputs stall, and the ones that escape do so by smuggling in an external anchor (a past version, a judge, a tool signal). Can AI systems improve themselves through trial and error? is the constructive version — it keeps an evolutionary archive of past variants rather than always breeding from the current best, which is exactly the anti-collapse move of refusing to throw away the population's history.


Sources 8 notes

Can evolutionary search beat sampling and revision at inference time?

Mind Evolution uses genetic algorithms with LLM-generated mutations and crossovers to significantly outperform Best-of-N and Sequential Revision on planning benchmarks. An island model sustains population diversity, preventing the premature convergence that single-trajectory refinement exhibits.

Can diffusion models perform evolutionary search in parameter space?

Denoising in diffusion models performs selection, mutation, and reproductive isolation—the core mechanisms of evolution. Diffusion Evolution empirically outperforms mainstream evolutionary algorithms by preserving multimodality where traditional methods collapse to single solutions.

Does outcome-based RL diversity loss spread across unsolved problems?

RL that rewards only final answer correctness sharpens the policy globally, concentrating probability mass on correct trajectories for solved problems while simultaneously reducing diversity on unsolved ones. Historical exploration (training diversity via UCB-style bonuses) and batch exploration (test-time diversity via repetition penalties) require structurally different mechanisms.

Does reinforcement learning squeeze exploration diversity in search agents?

RL training compresses behavioral diversity in search agents through the same entropy collapse mechanism documented in reasoning—policies converge on narrow reward-maximizing strategies. SFT on diverse demonstrations preserves exploration breadth, suggesting diversity-preservation techniques are essential for RL search scaling.

Can reward vectors be the hidden source of solution diversity?

Vector Policy Optimization shows that rewards decomposed per test-case, criterion, or persona provide an inherent diversity structure. Training solutions to span the Pareto frontier across these dimensions produces competent diversity grounded in real task trade-offs rather than external regularizers.

Does preference tuning always reduce diversity the same way?

RLHF reduces lexical-syntactic diversity in code generation but increases it in creative writing. The direction depends on what each domain incentivizes: code rewards convergence toward correct solutions, while creative writing rewards stylistic distinctiveness.

Can models reliably improve themselves without external feedback?

Pure self-improvement stalls due to the generation-verification gap, diversity collapse, and reward hacking. Reliable improvement methods succeed by smuggling in external anchors: past model versions, third-party judges, user corrections, or tool feedback.

Can AI systems improve themselves through trial and error?

DGM replaces formal proofs with empirical benchmarking and maintains an evolutionary archive of agent variants, achieving 2.5× improvement on SWE-bench and 2.2× on Polyglot by discovering capabilities like better code editing and context management.

Next inquiring lines