Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning domains like mathematics and programming by harnessing supervised fine-tuning (SFT) and reinforcement learning (RL) techniques to enhance the Chainof-Thought (CoT) reasoning. However, while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to verbose and redundant outputs, known as the “overthinking phenomenon”. Efficient Reasoning, which seeks to optimize reasoning length while preserving reasoning capabilities, offers practical benefits such as reduced computational costs and improved responsiveness for real-world applications. Despite its potential, efficient reasoning remains in the early stages of research. In this paper, we provide the first structured survey to systematically investigate and explore the current progress toward achieving efficient reasoning in LLMs.
Introduction. Large Language Models (LLMs) have emerged as exceptionally powerful AI tools, demonstrating advanced capabilities in natural language understanding and complex reasoning. Recently, the rise of reasoning-focused LLMs, also referred to as reasoning-capable models or Large Reasoning Models (LRMs) [98] such as OpenAI o1 [65] and DeepSeek-R1 [33], has significantly improved performance in System-2 reasoning domains [47], particularly in challenging mathematics [18,37] and programming tasks [7,19]. Evolving from foundational pretrained models (e.g., LLaMA [32,86]) trained with next-token prediction [25], these models typically leverage Chain-of-Thought (CoT) [92] Such reasoning abilities in LLMs are typically developed through supervised fine-tuning (SFT) and reinforcement learning (RL), which promote iterative and systematic problem-solving abilities. For instance, DeepSeek-R1 [33] undergoes multiple rounds of SFT and RL training, emphasizing structured thinking templates and rule-based reward mechanisms.
Discussion / Conclusion. Improving Reasoning Ability. From another perspective on efficiency, improving reasoning performance is an important topic [11,80]. To prioritize promising avenues by discarding ineffective strategies early, Meta-Reasoner [80] leverages contextual multi-armed bandits for evaluating reasoning progress and selecting the optimal strategy. In each round, the LLM produces a new reasoning step, and the meta-reasoner evaluates its output and generates a progress report, the meta-reasoner uses contextual multi-arm bandit to choose the best guidance strategy for the reasoning step. ITT [11] treats each transformer layer as a step in an internal thinking process. By dynamically allocating extra processing to difficult tokens through adaptive routing, ITT enables smaller language models to achieve performance comparable to larger models while using fewer training resources. RL vs. SFT, which is better? When comparing RL (Section 3.1) and SFT (Section 3.2) for creating efficient reasoning language models, the answer is unclear as each method has its own strengths.