SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Training, RL, and Test-Time Scaling Model Architecture and Internals

How should we balance parallel versus sequential compute at test time?

Test-time compute can prioritize breadth (trying many approaches) or depth (refining one approach). Which strategy works better, and does the answer depend on the problem?

Synthesis note · 2026-02-20 · sourced from Test Time Compute
How should we allocate compute budget at inference time?

Every approach to test-time compute lands somewhere on the parallel-sequential axis:

The pattern recurs consistently across papers, architectures, and tasks. The trade-off between coverage and depth is not a special feature of any one method — it's a fundamental tension in how to allocate finite compute.

Empirical evidence increasingly favors parallel approaches on general benchmarks (see Why does parallel reasoning outperform single chain thinking?), but the field's intuition still leans sequential because it maps onto human reasoning patterns. The disconnect between what works and what feels right is part of what makes the overthinking findings surprising.

The exponential counter-case: On structured compositional problems where solutions require sequential accumulation of intermediate results (graph connectivity, deep multi-hop chains), sequential CoT is exponentially better than parallel voting. See When does sequential reasoning beat parallel voting?. This resolves the apparent contradiction: parallel wins when independent short attempts can each reach an answer; sequential wins when the problem requires depth that short chains cannot achieve at all. Task structure is the moderating variable.

Training format as an upstream determinant: Does training data format shape reasoning strategy more than domain? shows that multiple-choice training produces BFS-like (parallel-resembling) reasoning; free-form training produces DFS-like (sequential) reasoning. The parallel/sequential trade-off plays out at training time too — format determines which pole a model's default reasoning strategy occupies before any inference-time decisions are made.

Retrieval-level parallel/sequential trade-off: RAG-R1 demonstrates the parallel/sequential dichotomy at the retrieval level. Single-query mode requires sequential multi-turn retrieval rounds; multi-query parallelism issues multiple queries simultaneously, reducing retrieval rounds and improving information diversity. The same structural trade-off — coverage (parallel) vs depth (sequential) — appears in RAG system design, not just reasoning token allocation.

Complexity-theoretic foundation — the Serial Scaling Hypothesis: Can parallel architectures solve inherently sequential problems? provides the theoretical grounding: inherently serial problems (mathematical reasoning, physical simulation, planning) cannot be solved by parallel architectures. Transformers and even diffusion models are in TC0 — provably incapable of solving inherently serial problems regardless of compute. This reframes the trade-off: it's not just empirical (which works better) but formal (some problems require serial computation). The parallel-wins finding applies to parallelizable problems; the serial hypothesis identifies problems where parallel is provably insufficient.

Evolutionary inference as a third mode: Mind Evolution introduces population-based search at inference time — neither pure parallel sampling nor sequential refinement, but iterative evolution of diverse candidate populations. See Can evolutionary search beat sampling and revision at inference time?. The island model sustains diversity that single-trajectory refinement loses, while the genetic recombination creates candidates that independent sampling cannot reach. This suggests the parallel/sequential axis may be insufficient — population-based methods occupy a distinct region of the design space.

Inquiring lines that use this note as a source 21

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 8

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
25 direct connections · 207 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

parallel vs sequential scaling is the recurring trade-off in test-time compute