Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

Paper · arXiv 2503.09567 · Published March 12, 2025
Inference-Time ScalingDeep Research Agents

Recent advancements in reasoning with large language models (RLLMs), such as OpenAI-O1 and DeepSeek-R1, have demonstrated their impressive capabilities in complex domains like mathematics and coding. A central factor in their success lies in the application of long chain-of-thought (Long CoT) characteristics, which enhance reasoning abilities and enable the solution of intricate problems. However, despite these developments, a comprehensive survey on Long CoT is still lacking, limiting our understanding of its distinctions from traditional short chain-of-thought (Short CoT) and complicating ongoing debates on issues like "overthinking" and "test-time scaling." This survey seeks to fill this gap by offering a unified perspective on Long CoT. (1) We first distinguish Long CoT from Short CoT and introduce a novel taxonomy to categorize current reasoning paradigms. (2) Next, we explore the key characteristics of Long CoT: deep reasoning, extensive exploration, and feasible reflection, which enable models to handle more complex tasks and produce more efficient, coherent outcomes compared to the shallower Short CoT.

Introduction. In recent years, the emergence of reasoning large language models (RLLMs) such as OpenAI O1 [208] and DeepSeek R1 [155] has sparked a growing body of research into Long Chain-of- Thought (Long CoT) reasoning, greatly improving their mathematical reasoning, programming tasks, and multidisciplinary knowledge reasoning capabilities [488, 686, 508, 50, 58, 673, 133, 776], as shown in Figure 1. This shift marks a significant departure from traditional approaches to task handling in large language models (LLMs) [798, 437, 439, 421]. Unlike the shorter chain-of-thought (Short CoT) used in traditional LLMs, Long CoT reasoning entails a more detailed, iterative process of exploration and reflection within a given problem space by test-time scaling [299, 520, 364]. This process has led to notable advancements in mathematical and logical reasoning, as well as in exploring how supervised fine-tuning (SFT) and reinforcement learning (RL) techniques can enhance the learning and exploration of extended reasoning chains [440, 385].

Discussion / Conclusion. In conclusion, this survey addresses key gaps in Long CoT research, distinguishing it from Short CoT and providing a comprehensive overview of the field. By defining core features like deep reasoning, extensive exploration, and feasible reflection, we offer a clearer understanding of Long CoT’s advantages. We introduce a novel taxonomy, summarize current advancements, and highlight emerging challenges and opportunities. Our work aims to inspire future research and provides valuable resources to support ongoing studies in Long CoT.