A Mechanistic Analysis of Looped Reasoning Language Models
Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM’s layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their internal dynamics differ from those of standard feedforward models. In this paper, we conduct a mechanistic analysis of the latent states in looped language models, focusing in particular on how the stages of inference observed in feedforward models compare to those observed in looped ones. To this end, we analyze cyclic recurrence and show that for many of the studied models each layer in the cycle converges to a distinct fixed point; consequently, the recurrent block follows a consistent cyclic trajectory in the latent space. We provide evidence that as these fixed points are reached, attention-head behavior stabilizes, leading to constant behavior across recurrences. Empirically, we discover that recurrent blocks learn stages of inference that closely mirror those of feedforward models, repeating these stages in depth with each iteration. We study how recurrent block size, input injection, and normalization influence the emergence and stability of these cyclic fixed points.
Introduction. The vast majority of current LLMs are based on the Transformer architecture (Vaswani et al., 2017), which comprises a sequence of blocks traversed in a feedforward manner to predict the next token. As the capability of these models increased, attention turned to eliciting reasoning capabilities in LLMs by increasing test-time computation, commonly In this paper, we compare how feedforward and looped LLMs organize computation across (effective) depth through the lens of stages of inference (Lad et al., 2024; Queipo-de Llano et al., 2025), a perspective suggesting that LLM inference can be decomposed into several distinct computational stages. Building on prior observations that repeated application of a shared recurrent block can approach a fixed point or steady state (Yang et al., 2023; Geiping et al., 2025), we show that such behavior necessarily implies one of two possibilities: either the contribution of the component Transformer blocks vanishes asymptotically, or their sequential application traces out a constant cyclic trajectory in latent space.
Discussion / Conclusion. This paper examines the limiting behavior of Looped Transformers, exploring implications for “mixing” stages of inference observed in feedforward models. We demonstrate across a range of architectures that recurrent blocks tend to “mirror” the stages of a feedforward Transformer, and provide evidence that this may be emergent behavior learned during training, even when not explicitly encouraged by the training process. We further investigate the implications for these mixing stages when models converge to a stable fixed point, and when they do not. Implications of Findings The implications of our findings are bidirectional. On the one hand, the structure of looped architectures provides a novel lens to study stages of inference, while tracking these stages simultaneously reveals the internal mechanics of recurrent depth.