How many particles and iterations does optimal expert discovery require?
This explores the swarm-search method where LLMs act as 'particles' drifting through weight space to discover composed experts — and what the actual compute budget (swarm size, iterations) for that discovery looks like.
This reads the question as being about a specific, almost playful idea in the corpus: treating language models as a swarm of particles that move through weight space, borrowing the physics of Particle Swarm Optimization, until they settle on a blended 'expert' that none of the starting models could be alone. The headline result in Can language models discover new expertise through collaborative weight search? is that this works with no gradient training at all and only ~200 validation examples to steer the swarm — the experts even answer questions every initial model got wrong. So the honest answer to 'how many particles and iterations' is: surprisingly few, because the method substitutes cheap evaluation signal for expensive backprop. The interesting part isn't the exact number — it's that discovery here is a search problem, not a training problem.
That reframing connects laterally to a quieter finding in the corpus about *how* search budgets behave. Do search steps follow the same scaling rules as reasoning tokens? shows that adding more search steps follows the same diminishing-returns curve as adding more reasoning tokens — which suggests there's no magic iteration count, just a knee in the curve past which extra particles or rounds buy you little. If you want the swarm to find an optimum efficiently, the question becomes where that knee sits, not how high you can crank the budget.
There's also a sharper way to think about *what kind* of search converges fast. Can neural networks explore efficiently at recommendation scale? makes the case that exploration is cheap when you spend compute only on the uncertainty that actually matters (epistemic, not noise) — it hit its targets with 29% fewer interactions by being selective. That's the same intuition behind why 200 examples can guide a weight-space swarm: a well-targeted signal collapses the search space faster than brute iteration. And Can an AI system improve its own search methods automatically? goes one level further — instead of fixing the number of particles and iterations, it lets an outer loop *rewrite the search mechanism itself*, discovering bandit and combinatorial methods that beat hand-tuned settings by 5x. The optimal budget, in other words, may be something you discover rather than specify.
One worth knowing for contrast: the corpus has a cautionary note in Do large language models actually perform iterative optimization?, which shows LLMs *can't* actually run iterative numerical optimization internally — they pattern-match plausible answers instead. That's why swarm methods like the one above run the iteration *externally*, over a population of real model evaluations, rather than asking a single model to 'optimize in its head.' The particles do the iterating; the models just get evaluated.
If you want the most surprising takeaway: optimal expert discovery doesn't really have a fixed particle count or iteration budget — the frontier work treats both as things to *search for* (via meta-optimization) or *taper off* (via scaling-law knees), and the leverage comes from spending your evaluation budget where uncertainty is highest, not from running the swarm longer.
Sources 5 notes
PSO-inspired swarms of LLM particles moving through weight space discover composed experts with new capabilities—including answering questions all initial experts failed on—using only 200 validation examples and no gradient-based training.
Deep research agents improve with more search steps in a pattern mirroring the reasoning-token relationship, with both exhibiting diminishing returns. This reveals a new inference-compute axis beyond model capability alone.
ENR separates aleatoric from epistemic uncertainty, focusing computation only on parameter uncertainty needed for Thompson sampling. It improved click-through rates 9% and ratings 6% while requiring 29% fewer interactions than baselines.
An outer loop successfully read inner loop code, identified bottlenecks, and generated new Python mechanisms at runtime, discovering combinatorial optimization and bandit methods that broke the inner loop's deterministic patterns and improved performance on GPT pretraining by 5x.
Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.