SYNTHESIS NOTE

How do looped transformer layers actually behave during inference?

When language models loop their layers to improve reasoning, do they discover new computations or repeat existing ones? Understanding the internal dynamics could explain why recurrent architectures outperform simple depth scaling.

Synthesis note · 2026-06-03 · sourced from Reasoning Architectures

Looping an LLM's layers in the latent dimension improves reasoning, but it has been unclear how the internal dynamics differ from a standard feedforward model. This mechanistic analysis answers through the lens of stages of inference — the idea that LLM computation decomposes into distinct computational stages.

The core result is geometric. For many looped models, each layer in the cycle converges to a distinct fixed point, so the recurrent block follows a consistent cyclic trajectory in latent space. As those fixed points are reached, attention-head behavior stabilizes, producing constant behavior across recurrences. And empirically the recurrent blocks learn stages of inference that closely mirror feedforward models — repeating those stages in depth with each iteration. This appears to be emergent: it shows up even when training does not explicitly encourage it. The repeated application of a shared block necessarily implies one of two regimes — either the block's contribution vanishes asymptotically, or it traces a constant cyclic trajectory.

The implication that matters: recurrent depth is learned re-application of computation, not the discovery of genuinely new computation per loop. The loop re-runs the same inferential stages rather than adding qualitatively different ones. This is the mechanistic complement to Does looping layers beat adding depth in diffusion models?: it explains why reused computation can match or beat added depth — the network was re-enacting stages anyway, and looping makes that reuse explicit and parameter-free. Recurrent block size, input injection, and normalization govern whether these cyclic fixed points emerge and stay stable.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 95 in 2-hop network ·medium cluster Open in graph ↗

How do looped transformer layers actually behave… Does looping layers beat adding depth in diffusion… Can looped transformers generalize to unseen knowl… Can recurrent hierarchies achieve reasoning that t…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

How do looped transformer layers actually behave during inference?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4