TOPIC

Reasoning Architectures

35 synthesis notes · 60 source papers

View as

Can tiny recursive networks outperform massive language models?

Does a small network that refines its reasoning through recursion on a latent state actually generalize better than billion-parameter LLMs on hard puzzles like ARC-AGI? What makes recursion more powerful than scale?

Does planning direction affect how hard problems become?

Planning research typically goes forward only. But some problems get easier when you work backward from the goal. What makes direction matter, and can language models exploit this?

Do base models already contain hidden reasoning ability?

Explores whether reasoning capability emerges during pre-training as a latent feature rather than being created by post-training methods like reinforcement learning or fine-tuning.

Can modular cognitive tools unlock reasoning without training?

Can reasoning capabilities be elicited by structuring LLM calls as isolated cognitive operations—understanding, recalling, examining, and backtracking—rather than through reinforcement learning?

Does chain of thought reasoning actually explain model decisions?

When language models show their reasoning steps in agentic pipelines, does the quality of those steps predict or explain the quality of final outputs? This matters for trusting and debugging AI systems.

Can a single problem unlock reasoning through solution critique?

Does exposing models to diverse critiques of different solutions to one problem activate reasoning as effectively as training on many problems? This tests whether solution diversity matters more than problem diversity.

Can reasoning and tool execution be truly decoupled?

Can LLM reasoning be separated from tool observations to eliminate redundant re-prompting and enable parallel execution? Two recent architectures suggest yes, but what are the tradeoffs?

Can interleaving reasoning with real-world feedback prevent hallucination?

Does grounding language model reasoning in external world observations rather than internal associations help prevent error propagation and false outputs? This explores whether breaking the static chain-of-thought pattern can catch and correct mistakes in real time.

Can models reason without generating visible thinking tokens?

Explores whether intermediate reasoning must be verbalized as text tokens, or if models can think in hidden continuous space. Challenges a foundational assumption about how language models scale their reasoning capabilities.

Can structured debate roles help small models detect ambiguity?

Small language models struggle to recognize when problems are underspecified. Can assigning explicit leader-follower roles in multi-agent debates overcome this limitation and boost ambiguity detection accuracy?

Do large language models actually perform iterative optimization?

Explores whether LLMs execute genuine numerical procedures like Newton-Raphson or instead pattern-match to memorized solution templates when solving constrained optimization problems.

Why do LLMs struggle with exploration in simple decision tasks?

This explores why large language models fail at exploration—a core decision-making capability—even when they excel at other tasks, and what specific conditions might help them succeed.

Do larger language models solve constrained optimization better?

Explores whether scaling LLMs—through more parameters, better training, or reasoning extensions—improves their ability to satisfy constraints in real optimization problems like power grids and portfolios.

How do looped transformer layers actually behave during inference?

When language models loop their layers to improve reasoning, do they discover new computations or repeat existing ones? Understanding the internal dynamics could explain why recurrent architectures outperform simple depth scaling.

Can stochastic latent reasoning help models explore multiple solutions?

This explores whether making recursive reasoning paths probabilistic rather than deterministic lets models maintain uncertainty and consider alternative hypotheses when problems admit multiple valid solutions.

Do fine-tuned language models actually learn optimization procedures?

Can RL fine-tuning teach LLMs to solve constraint-optimization problems through genuine reasoning, or does it merely sharpen pattern-matching? Testing on out-of-distribution variants reveals the mechanism.

Why do outcome-based reward models fail at intermediate step evaluation?

Outcome-based reward models (ORMs) evaluate only final results, creating a mismatch with the need to assess reasoning quality at intermediate steps. Understanding this failure mode matters for building better AI reasoning systems.

Which tokens in reasoning chains actually matter most?

Do language models internally rank tokens by functional importance? Greedy pruning experiments explore whether models preserve symbolic computation while discarding linguistic scaffolding, and what this reveals about reasoning architecture.

Do reasoning cycles in hidden states reveal aha moments?

What if the internal loops in model reasoning—visible in hidden-state topology—correspond to the reconsidering moments that happen during reasoning? This note explores whether graph cyclicity captures a mechanistic signature of insight.

Do reasoning models actually beat standard models on optimization?

Explores whether extended chain-of-thought in reasoning models delivers performance gains on constraint-satisfaction problems like power-grid optimization. Matters because reasoning models are treated as automatic upgrades, but the evidence may not support that claim.

Can reasoning systems scale wider instead of only deeper?

Explores whether sampling multiple parallel latent trajectories offers a faster scaling path than recursive refinement alone. Matters because it could unlock latency-efficient reasoning at test time.

Can models reason without generating visible thinking steps?

Do machine reasoning systems actually require verbalized chains of thought, or can they solve complex problems through hidden computation? This challenges how we measure and understand reasoning.

Can curriculum learning approximate expensive process supervision?

Can a reverse curriculum that slides backward from task completion provide step-level insight comparable to human process annotations, but at outcome supervision cost?

Does RL teach reasoning or just when to use it?

Does reinforcement learning in thinking models actually create new reasoning abilities, or does it simply teach existing capabilities when to activate? This matters for understanding where reasoning truly emerges.

When does RL actually extend reasoning beyond pretraining?

Does reinforcement learning genuinely expand a model's reasoning capabilities, or does it merely improve sampling from existing knowledge? This question hinges on whether pretraining provides sufficient foundation and whether RL targets tasks within reach.

Why do RL agents stop asking informative questions?

RL-trained agents often fail to seek information effectively, despite being trained to do so. Understanding whether this reflects a capability gap or a training dynamics problem could reveal how to unlock better information-seeking behavior.

Does separating planning from execution improve reasoning accuracy?

Can modular LM architectures that split problem decomposition from solution execution outperform monolithic models? This explores whether decoupling these cognitive operations reduces interference and boosts performance.

Does supervised fine-tuning actually improve reasoning on optimization problems?

When SFT boosts benchmark scores on constraint-optimization tasks, does it genuinely improve the model's ability to find feasible solutions, or just its ability to format answers convincingly?

Can symbolic solvers fix how LLMs reason about logic?

LLMs excel at understanding natural language but fail at precise logical inference. Can pairing them with deterministic symbolic solvers—using solver feedback to refine attempts—overcome this fundamental weakness?

Does chain-of-thought reasoning actually explain AI decisions?

Chain-of-thought is pitched as a transparency tool for agentic AI, but empirical evidence raises questions about whether reasoning chains actually predict or explain the system's outputs in practice.

Does adding randomness to recursive models actually help reasoning?

GRAM's ablations test whether stochasticity alone improves recursive architectures, or whether the gains depend on a specific training framework. This matters because it separates surface mechanisms from the methods that make them work.

Should LLMs handle abstraction only in optimization?

What if LLMs worked exclusively on translating problems to formal constraints, while deterministic solvers handled the numeric work? Explores whether this division of labor could overcome LLM failures in iterative computation.

Does RL post-training create reasoning or just deploy it?

Investigates whether reasoning capability emerges during RL fine-tuning or already exists in base models. Matters because it reshapes how we build and optimize reasoning systems.

Can backward reasoning during training improve forward reasoning?

Does training models to reason backward—generating inverse questions and solutions—build internal consistency checking that transfers to forward-only inference? This explores whether backward capacity internalized during training without test-time deployment can enhance reasoning quality.

Why do trajectories matter more than individual examples for in-context learning?

Can language models learn new sequential decision-making tasks from context alone, and if so, what data properties make this possible? This explores why isolated state-action pairs fail where full trajectories succeed.

Source papers 60

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.

100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models
The recent development of reasoning language models (RLMs) represents a novel evolution in large language models. In particular, the recent release of DeepSeek- R1 has generated widespread social impa…
A Mechanistic Analysis of Looped Reasoning Language Models
Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM’s layers in the latent dimension, resulting i…
A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1
System 1 thinking is fast, automatic, and intuitive, operating effortlessly and often unconsciously. It relies on neural pathways that enable rapid processing, especially in situations needing quick r…
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
1 Introduction Reinforcement learning (RL) has emerged as a new scaling paradigm for enhancing the capabilities of large language models (LLMs) by enabling thinking abilities [52]. Given a prompt, RL…
Agentic Code Reasoning
Can LLM agents explore codebases and reason about code semantics without executing the code? We study this capability, which we call agentic code reasoning, and introduce semi-formal reasoning: a stru…
Agentic Reasoning for Large Language Models
Abstract: Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in close…
Base Models Know How to Reason, Thinking Models Learn When
Why do thinking language models like DeepSeek R1 outperform their base counterparts? Despite consistent performance gains, it remains unclear to what extent thinking models learn entirely new reasonin…
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
Large language models (LLMs) have recently shown impressive performance on tasks involving reasoning, leading to a lively debate on whether these models possess reasoning capabilities similar to human…
Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate
Abstract: Large Language Models (LLMs) have demonstrated significant capabilities in understanding and generating human language, contributing to more natural interactions with complex systems. Howeve…
Can Large Language Models Reason and Optimize Under Constraints?
Large Language Models (LLMs) have achieved notable performance across a wide range of natural language understanding and generation tasks, from open-ended dialogue and code synthesis to mathematical r…
Can Large Language Models Reason and Plan?
Their seeming versatility has however led many researchers to wonder whether they can also do well on planning and reasoning tasks typically associated with System 2 competency. Nothing in the traini…
Can large language models explore in-context?
We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance …
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
Narrative comprehension on long stories and novels has been a challenging domain attributed to their intricate plotlines and entangled, often evolving relations among characters and entities. Given th…
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computat…
Cumulative Reasoning with Large Language Models
Despite the recent advancements in language models (LMs), their ability to solve complex problems remains limited. This paper introduces Cumulative Reasoning (CR), a novel approach that utilizes LMs c…
Do LLMs Encode Functional Importance of Reasoning Tokens?
Large language models solve complex tasks by generating long reasoning chains, achieving higher accuracy at the cost of increased computational cost and reduced ability to isolate functionally relevan…
Efficient Reasoning with Hidden Thinking
Chain-of-Thought (CoT) reasoning has become a powerful framework for improving complex problem-solving capabilities in Multimodal Large Language Models (MLLMs). However, the verbose nature of textual …
Efficient Tool Use with Chain-of-Abstraction Reasoning
To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools…
Eliciting Reasoning in Language Models with Cognitive Tools
The recent advent of reasoning models like OpenAI’s o1 was met with excited speculation by the AI community about the mechanisms underlying these capabilities in closed models, followed by a rush of r…
Emergent Hierarchical Reasoning In LLMs Through Reinforcement Learning
Reinforcement Learning (RL) has proven highly effective at enhancing the complex reasoning abilities of Large Language Models (LLMs), yet underlying mechanisms driving this success remain largely opaq…
Flows: Building Blocks of Reasoning and Collaborating AI
Recent advances in artificial intelligence (AI) have produced highly capable and controllable systems. This creates unprecedented opportunities for structured reasoning as well as collaboration among …
From Context to Skills: Can Language Models Learn from Context Skillfully?
Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge…
Generalization to New Sequential Decision Making Tasks with In-Context Learning
However, the sequential decision making setting poses additional challenges having a lower tolerance for errors since the environment’s stochasticity or the agent’s actions can lead to unseen, and som…
Generative Recursive Reasoning
How should future neural reasoning systems implement extended computation? Recursive Reasoning Models (RRMs) offer a promising alternative to autoregressive sequence extension by performing iterative …
Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models
We present Quasar-1, a novel architecture that introduces temperature-guided reasoning to large language models through the Token Temperature Mechanism (TTM) and Guided Sequence of Thought (GSoT). Our…
Hierarchical Reasoning Model
Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT…
Improving Factuality and Reasoning in Language Models through Multiagent Debate
![A screenshot of a math test](/assets/paper-images/ImprovingFactualityAndReasoningInLanguageModelsThroughMultiagentDebate.png) Large language models (LLMs) have demonstrated remarkable capabilities …
Large Causal Models From Large Language Models
We introduce a new paradigm for building large causal models (LCMs) that exploits the enormous potential latent in today’s large language models (LLMs). We describe our ongoing experiments with an imp…
Less is More: Recursive Reasoning with Tiny Networks
Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies. This biologically inspired method beats Large Language models (LLMs) on hard …
Levels of Analysis for Large Language Models
Modern artificial intelligence systems, such as large language models, are increasingly powerful but also increasingly hard to understand. Recognizing this problem as analogous to the historical diffi…
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning
Large Language Models (LLMs) have shown human-like reasoning abilities but still struggle with complex logical problems. This paper introduces a novel framework, LOGICLM, which integrates LLMs with sy…
Logical Reasoning in Large Language Models: A Survey
With the emergence of advanced reasoning models like OpenAI o3 and DeepSeek-R1, large language models (LLMs) have demonstrated remarkable reasoning capabilities. However, their ability to perform rigo…
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
Recent language models exhibit strong reasoning capabilities, yet the influence of long-context capacity on reasoning remains underexplored. In this work, we hypothesize that current limitations in re…
Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs
Reinforcement Learning with Verifiable Reward (RLVR) is empirically shown to notably enhance the reasoning performance of large language models (LLMs), particularly in mathematics and programming. How…
Navigating the Latent Space Dynamics of Neural Models
Neural networks transform high-dimensional data into compact, structured representations, often modeled as elements of a lower dimensional latent space. In this paper, we present an alternative interp…
On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents
Reinforcement learning (RL) with outcome-based rewards has achieved significant success in training large language model (LLM) agents for complex reasoning tasks. However, in active reasoning where ag…
On Predictive planning and counterfactual learning in active inference
Given the rapid advancement of artificial intelligence, understanding the foundations of intelligent behaviour is increasingly important. Active inference, regarded as a general theory of behaviour, o…
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
Recent reinforcement learning (RL) techniques have yielded impressive reasoning improvements in language models, yet it remains unclear whether post-training truly extends a model’s reasoning ability …
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
it remains contentious whether RL truly expands a model’s reasoning capabilities or merely amplifies high-reward outputs already latent in the base model’s distribution, and whether continually scalin…
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
We hypothesize that cross-domain generalization arises from shared abstract reasoning prototypes — fundamental reasoning patterns that capture the essence of problems across domains. These prototypes …
Pushdown Layers: Encoding Recursive Structure in Transformer Language Models
Recursion is a prominent feature of human language, and fundamentally challenging for self-attention due to the lack of an explicit recursive-state tracking mechanism. Consequently, Transformer langua…
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Abstract: Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement “algorithmic procedures” that can be used to deduce answers to hard problems. Doing so…
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
“There is a trending paradigm[1; 2; 3; 4; 5; 6; 7; 8] to couple large language models (LLMs) with external plugins or tools, enabling LLMs to interact with environment [9; 10] and retrieve up-to-date …
React - Synergizing Reasoning And Acting In Language Models
“While large language models (LLMs) have demonstrated impressive performance across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-though…
Reasoning Language Models: A Blueprint
such as OpenAI’s o1 and o3, DeepSeek-V3, and Alibaba’s QwQ, have redefined AI’s problem-solving capabilities by extending large language models (LLMs) with advanced reasoning mechanisms. Yet, their hi…
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
With the growing adoption of large language model agents in persistent real-world roles, they naturally encounter continuous streams of tasks. A key limitation, however, is their failure to learn from…
Reinforcement Pre-Training
In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL). Specifically, we reframe next-token prediction as a rea…
Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
we propose Reversal of Thought (RoT), a novel framework aimed at enhancing the logical reasoning abilities of LLMs. RoT utilizes a Preference-Guided Reverse Reasoning warm-up strategy, which integrate…
Reverse Thinking Makes LLMs Stronger Reasoners
Reverse thinking plays a crucial role in human reasoning. Humans can reason not only from a problem to a solution but also in reverse, i.e., start from the solution and reason towards the problem. Thi…
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling…
Self-Discover: Large Language Models Self-Compose Reasoning Structures
*Table 2. All 39 reasoning modules consisting of high-level cognitive heuristics for problem-solving. We adopt them from Fernando et al.* (_2023_). Reasoning Modules 1 How could I devise an experim…
Strategic Reasoning with Language Models
This paper introduces an approach that uses pretrained LLMs with few-shot chain-of-thought examples to enable strategic reasoning for AI agents. Our approach uses systematically generated demonstratio…
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
Large language models (LLMs) often fail to learn effective long chain-of-thought (Long CoT) reasoning from human or non-Long-CoT LLMs imitation. To understand this, we propose that effective and learn…
Thinking Forward and Backward: Effective Backward Planning with Large Language Models
Large language models (LLMs) have exhibited remarkable reasoning and planning capabilities. Most prior work in this area has used LLMs to reason through steps from an initial to a goal state or criter…
Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
Agentic pipelines present novel challenges and opportunities for human-centered explainability. The HCXAI community is still grappling with how best to make the inner workings of LLMs transparent in a…
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Recent large-scale reasoning models have achieved state-of-the-art performance on challenging mathematical benchmarks, yet the internal mechanisms underlying their success remain poorly understood. In…
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
The core challenge in applying RL to complex reasoning is to identify a sequence of actions that result in positive rewards and provide appropriate supervision for optimization. Outcome supervision pr…
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
We have witnessed that strong LLMs like Qwen-Math, MiMo, and Phi-4 possess immense reasoning potential inherited from the pre-training stage. With reinforcement learning (RL), these models can improve…
Why Do Multi-agent LLM Systems Fail?
[[Routers]] Despite growing enthusiasm for Multi-Agent LLM Systems (MAS), their performance gains across popular benchmarks often remain minimal compared to single-agent frameworks. This gap highlig…
𝙻𝙼𝟸: A Simple Society of Language Models Solves Complex Reasoning
Despite demonstrating emergent reasoning abilities, Large Language Models (LLMS) often lose track of complex, multi-step reasoning. Existing studies show that providing guidance via decomposing the or…