SYNTHESIS NOTE
Agentic Systems and Tool Use Training, RL, and Test-Time Scaling

When can weak models match strong model performance?

Can sampling many weak model calls replicate strong model results? This explores whether more attempts and selection mechanisms can bridge the performance gap without fundamentally stronger reasoning.

Synthesis note · 2026-06-03 · sourced from Test Time Compute

Can a committee of weak reasoning-model calls reach the performance of a much stronger model? The honest answer is "yes, but not because more agents help." The mechanism is boosting: sampling exposes latent correct solutions in the proposal pool, but critics and comparators must then recover them without access to the hidden verifier.

The paper separates four quantities — proposal coverage, local identifiability, progress, and diversity — and proves a sharp limit. Repeated sampling can amplify coverage, but coverage alone cannot create useful critics or comparators. Reliable amplification requires an additional local soundness signal: execution, proof checking, type checking, tests, or constraint solving. With it, rank-based bounds show when local selection errors compose into reliable trajectories. Without it, weak-model failure is revealed as a selection failure, not an information failure — on SWE-bench Verified, hidden-test-passing patches often appear in a pool of nano-model proposals even when a single call fails.

This identifies two distinct ceilings. When a correct patch is in the pool but the harness picks another, the bottleneck is identifiability — better critics, tests, or aggregation help. When no correct patch appears at all, the bottleneck is coverage — no selector can recover an absent solution. The result disciplines the "scale up agents" intuition: it pins the gain to verifiable domains and to the presence of a sound local check. It extends What limits how much models can improve themselves? to the committee setting — the verification advantage is what turns latent coverage into solve rate.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 147 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

a committee of weak model calls matches strong models only when a local soundness signal converts latent correct solutions into selections