Can evolutionary search beat sampling and revision at inference time?

Inquiring lines that use this note as a source 32

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Why do foundation models develop heuristics instead of world models?
Can closed-form solutions compete with gradient descent optimization?
Do dynamic environments enable different kinds of agent-environment coevolution?
How do evolutionary archives enable diverse exploration in self-improving systems?
What makes external diversity more effective than sequential revision steps?
Why do evolutionary algorithms collapse to single solutions under selection pressure?
What makes diffusion sampling preserve multiple optimal solutions better than alternatives?
How does latent space diffusion enable evolutionary search in high dimensions?
Can accelerated sampling techniques from image generation speed up evolutionary search?
Can evolutionary approaches avoid the overthinking failure mode of iterative refinement?
How does fitness-proportional selection guide LLM recombination in unstructured solution spaces?
Why does island model genetic evolution maintain diversity better than single populations?
Does population-based evolution transcend the parallel versus sequential compute tradeoff?
Why do parallel and sequential test-time search methods produce equivalent results under fixed budgets?
Can structural diversity through role assignment replace emergent diversity in small models?
Why does genetic programming outperform direct LLM generation by 86 percent?
Can evolutionary search solve persona diversity better than prompt engineering?
Can optimization algorithms exploit the shift between procedural and planning bottlenecks?
Can token probability distributions extend swarm composition across different model architectures?
How does graph-based tool sampling differ from random sampling in diversity?
What distinguishes intrinsic search from extrinsic search method approaches?
How should organizations redesign workflows if LLMs cannot solve optimization directly?
Why does the hot-path cold-path split map onto formation and evolution?
How does directional diversity compare to other forms of parallel planning?
Can models adapt and combine search strategies beyond their training algorithm?
Should test-time search maximize diversity of competent solutions instead of converging on one strategy?
Is agentic efficiency analogous to convergent evolution in biology?
Can backward planning reduce search difficulty when multiple goal state paths exist?
Does policy entropy collapse prevent inference-time search from finding solutions?
Can evolutionary search unlock problems that best-of-n selection cannot solve?
Why do automated evaluators enable longer evolutionary loops than human feedback?
Can the same problem be solved by multiple evolutionary search strategies?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 178 in 2-hop network ·dense cluster Open in graph ↗

Can evolutionary search beat sampling and revisi… Why does majority voting outperform more complex i… Do iterative refinement methods suffer from overth… How should we balance parallel versus sequential c… Can tree search replace human feedback in LLM trai…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why does majority voting outperform more complex inference methods? Simple majority voting across independent samples often matches or beats sophisticated alternatives like Best-of-N and sequential revision. What makes this basic approach so hard to beat for reasoning models?
Mind Evolution goes beyond voting: population-based recombination rather than just aggregation
Do iterative refinement methods suffer from overthinking? Iterative refinement approaches like Self-Refine structurally resemble token-level overthinking in o1-like models. Does revision across multiple inference calls reproduce the same accuracy degradation seen within single inferences?
evolutionary approach avoids this through population diversity and island model
How should we balance parallel versus sequential compute at test time? Test-time compute can prioritize breadth (trying many approaches) or depth (refining one approach). Which strategy works better, and does the answer depend on the problem?
Mind Evolution transcends this dichotomy: iterative evolution with parallel sub-populations
Can tree search replace human feedback in LLM training? Explores whether Monte Carlo Tree Search can generate quality signals for self-improvement without expensive human annotations. Matters because annotation bottlenecks currently limit LLM scaling.
MCTS searches a tree; Mind Evolution searches a population; both use structured exploration

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Evolving Deeper LLM Thinking0.92 match · arxiv ↗
Learning to Discover at Test Time0.87 match · arxiv ↗
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence0.84 match · arxiv ↗
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution0.84 match · arxiv ↗
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems0.84 match · arxiv ↗
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents0.83 match · arxiv ↗
Reasoning LLMs are Wandering Solution Explorers0.83 match · arxiv ↗
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?0.83 match · arxiv ↗

Search by related questions 4

Suggested questions this note speaks to — click to search the collection, or type your own.