Can tiny recursive networks outperform massive language models?

Does a small network that refines its reasoning through recursion on a latent state actually generalize better than billion-parameter LLMs on hard puzzles like ARC-AGI? What makes recursion more powerful than scale?

Synthesis note · 2026-06-03 · sourced from Reasoning Architectures

Autoregressive LLMs are fragile on hard puzzles because a single wrong token can invalidate an answer, and the usual patches — chain-of-thought and test-time compute — are expensive, data-hungry, and brittle. The Tiny Recursive Model (TRM) takes the opposite bet: a single 2-layer network with only 7M parameters that recurses on its own latent reasoning feature and progressively improves its final answer. It reaches 45% on ARC-AGI-1 and 8% on ARC-AGI-2 — higher than most LLMs including DeepSeek R1, o3-mini, and Gemini 2.5 Pro — with less than 0.01% of their parameters.

The keeper is what TRM removes relative to its predecessor HRM: no fixed-point theorem, no biological hierarchy, no two interacting networks, no extra halting forward pass. A single tiny network recursing beats the hierarchical version, which isolates recursion on a latent state — not scale, not hierarchy — as the source of generalization. (The authors are candid that no single choice is universally optimal: replacing self-attention with an MLP helped Sudoku but hurt other tasks, so architecture still needs per-problem tuning and scaling laws.)

This sharpens the vault's recurrence cluster. TRM directly simplifies Can recurrent hierarchies achieve reasoning that transformers cannot? (HRM), and it agrees mechanistically with How do looped transformer layers actually behave during inference?: recursion re-applies computation on a latent state, and that reuse — at tiny scale — is what generalizes.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 97 in 2-hop network ·medium cluster Open in graph ↗

Can tiny recursive networks outperform massive l… Can recurrent hierarchies achieve reasoning that t… How do looped transformer layers actually behave d… Can looped transformers generalize to unseen knowl…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can recurrent hierarchies achieve reasoning that transformers cannot? Can a dual-timescale recurrent architecture escape the computational limitations of standard transformers and solve complex reasoning tasks without explicit chain-of-thought? This explores whether architectural design, not scale, enables true algorithmic reasoning.
TRM strips HRM to one tiny network and generalizes better, isolating recursion from hierarchy
How do looped transformer layers actually behave during inference? When language models loop their layers to improve reasoning, do they discover new computations or repeat existing ones? Understanding the internal dynamics could explain why recurrent architectures outperform simple depth scaling.
mechanistic agreement: recursion reuses computation on a latent state
Can looped transformers generalize to unseen knowledge combinations? Do transformers that reuse layers across iterations succeed where standard transformers fail at composing facts in novel ways? This matters because systematic generalization is a hallmark of human reasoning.
both show recurrent depth buys generalization vanilla fixed-depth models lack

Can tiny recursive networks outperform massive language models?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4