First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Paper · arXiv 2510.08308 · Published October 9, 2025
Reasoning by Reflection and Self-CritiqueTraining and Fine-Tuning

Large language models have recently demonstrated significant gains in reasoning ability, often attributed to their capacity to generate longer chains of thought and engage in reflective reasoning. However, the contribution of reflections to performance improvement remains unclear. In this paper, we systematically analyze the rollouts of eight reasoning models on five mathematical datasets. We focus on reflective behaviours where the model has already produced an answer but continues reflecting before finalizing its output. Our analysis reveals that reflections are predominantly confirmatory and rarely alter the model’s initial answer, a pattern consistent across models and datasets. To understand the role of reflections in training, we construct supervised fine-tuning (SFT) datasets with varying amounts of reflection steps. We observe that training models on rollouts with more reflection steps primarily enhances first-answer correctness rather than the ability to correct initially wrong answers through reflections. This motivates us to propose a question-aware early-stopping method that enhances inference-time token efficiency by stopping the reasoning process once a few plausible candidate answers are generated, thereby reducing unnecessary reflection steps.

Introduction. Large language models (LLMs) have made remarkable progress in reasoning abilities, achieving strong performance across domains such as mathematics, logic, and code synthesis (Cobbe et al., 2021; Chen et al., 2021). This leap is largely attributable to the development of Chain-of-Thought (CoT) reasoning pattern (Nye et al., 2021; Wei et al., 2022), which guides the model to break down complex problems into a series of intermediate steps. Recent breakthroughs such as OpenAI’s o1 (OpenAI et al., 2024) and DeepSeek-R1 (DeepSeek-AI et al., 2025) have brought LLMs to the next paradigm, known as reasoning models (Ke et al., 2025; Zhang et al., 2025).

Discussion / Conclusion. In this work, we systematically analyze the reflection pattern in long CoTs of reasoning models. We investigate their role in both the training and the inference phases. Through extensive experiments, we show that the reflections of reasoning models are mostly confirmatory, yet they are still helpful when included in training data. We also show that during inference time, confirmatory reflections consume a decent amount of tokens, while only introducing marginal improvements. To this end, we develop an efficient reasoning technique during inference to early stop excessive reflections while maintaining the performance. Together, these results provide a clearer understanding of the role of reflections and offer practical guidance for data design and inference efficiency.