SYNTHESIS NOTE
Agentic Systems and Tool Use

Can extreme task decomposition enable reliable execution at million-step scale?

Can breaking tasks into maximally atomic subtasks with voting-based error correction solve the fundamental reliability problem in long-horizon tasks? This challenges whether better models or better decomposition is the path to high-reliability AI systems.

Synthesis note · 2026-02-23 · sourced from Novel Architectures

A system with a 1% per-step error rate is expected to fail after 100 steps of a million-step task. This makes traditional approaches to long-horizon tasks fundamentally infeasible — improving model accuracy from 99% to 99.99% is insufficient for tasks requiring thousands of dependent steps. MAKER (Massively Decomposed Agentic Processes) takes a different approach: instead of improving per-step accuracy, decompose until each step is trivially reliable, then apply error correction.

Three core components:

  1. Decomposition into minimal subtasks: Each agent handles a single, tiny "micro-role" rather than anthropomorphized human-level roles. By avoiding complex role assignments and instead exploiting the machine-like nature of LLMs, each subtask becomes solvable with high reliability.
  2. Error correction via subtask-level voting: Multiple agents independently solve the same subtask; voting identifies the correct answer. This is error correction at the finest possible granularity.
  3. Red-flagging to reduce correlated errors: Detects situations where voting might fail because errors are correlated across agents, and applies additional verification.

The scaling laws are formalized: probability of success and expected cost change predictably with total steps and decomposition level. Under extreme decomposition, effective scaling is feasible; without it, infeasible.

The most counterintuitive finding: state-of-the-art reasoning models are not required. Relatively small non-reasoning models suffice when the decomposition is extreme enough. This inverts the standard approach to hard problems — instead of smarter models, use dumber models on smaller problems.

This extends Does separating planning from execution improve reasoning accuracy? to an extreme: not just separating two functions, but decomposing the entire task into maximally atomic units. It also extends Why does majority voting outperform more complex inference methods? from answer-level voting to subtask-level voting with formalized scaling properties.

The implication for AI deployment: for tasks requiring very high reliability over many steps (organizational processes, scientific experiments, production pipelines), the path may run through decomposition and redundancy rather than through better models.

Inquiring lines that use this note as a source 43

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 7

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
19 direct connections · 152 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

extreme task decomposition into microagents with voting enables error-free execution at million-step scale