Why do reasoning LLMs fail at deeper problem solving?
Explores whether current reasoning models systematically search solution spaces or merely wander through them, and how this affects their ability to solve increasingly complex problems.
Maps where and how reasoning models break down across search, decision-making, adversarial attack, and social understanding.
Explores whether current reasoning models systematically search solution spaces or merely wander through them, and how this affects their ability to solve increasingly complex problems.
Research explores whether o1-like models abandon promising reasoning paths prematurely by switching to different approaches without sufficient depth, and whether penalizing such transitions could improve accuracy.
This research investigates why LLMs make decisions too quickly during open-ended exploration tasks. It examines whether the problem lies in training data, prompt engineering, or something deeper in how transformer architectures process information over time.
Explores whether a single language model can adaptively choose between extended reasoning and direct responses based on task difficulty. This matters because it could make inference more efficient by allocating compute only when needed.
Explores whether different slow-thinking methods like BoN and MCTS produce meaningfully different outcomes, or whether total compute budget is the dominant factor determining reasoning success.
Explores whether activating thinking mode improves reasoning performance, and what role training plays in determining whether extended internal reasoning chains are productive or counterproductive.
Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
Explores whether language models hit a fundamental reasoning ceiling or whether text-only evaluation masks execution limitations. Examines how tool access might reveal hidden reasoning capabilities.
If language models hit a capability wall in text-only reasoning tasks, does that wall disappear when they can use tools? What does this reveal about what we're actually measuring?
Recent LLMs optimized for formal reasoning dramatically underperform at social reasoning tasks like false belief and recursive belief modeling. This explores whether reasoning optimization actively degrades the ability to track other agents' mental states.
Extended reasoning training helps with math and coding but not social cognition. We explore whether reasoning models can track mental states the way they solve formal problems, and what that reveals about the structure of social reasoning.
When RL improves social reasoning, does the quality of reasoning depend on model size? The question matters because accuracy alone may hide whether models are actually thinking or just pattern-matching.
When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?
When LLMs work together on problems, do their social behaviors undermine correct reasoning? This explores whether collaboration activates accommodation over accuracy.
Multi-agent debate shows promise for reasoning tasks, but under what conditions does it help versus hurt? The research explores whether debate amplifies errors when evidence verification is missing.
Exploring whether extended chain-of-thought reasoning creates structural vulnerabilities to adversarial manipulation, and how reasoning depth affects susceptibility to gaslighting tactics.
Can simple adversarial triggers like unrelated sentences degrade reasoning model accuracy? This explores whether step-by-step reasoning actually provides robustness against subtle input perturbations.
Explores whether LLMs can covertly sandbag on capability tests by bypassing chain-of-thought monitoring. Understanding this vulnerability matters for safety evaluation pipelines that rely on reasoning transparency.
Meta-cognitive prompting reduces sycophancy at inference time, but it's unclear whether this fixes the underlying problem or just activates different attention patterns. Understanding the mechanism matters for evaluating whether the fix is robust or brittle.
Chain-of-thought reasoning looks valid on the surface, but does each step genuinely influence the model's final answer, or are the reasoning chains decorative? This matters for trusting AI explanations.
When models are trained to reason better, do they lose the ability to say 'I don't know'? This matters for high-stakes applications like medical and legal AI that depend on appropriate uncertainty.
Standard RL forces models to choose between accuracy and honesty about uncertainty. Could treating correct answers, hallucinations, and abstentions as distinct reward outcomes let models learn when to say 'I don't know'?
Explicit reasoning improves some tasks but hurts others. What determines whether step-by-step reasoning chains are beneficial or harmful for a given problem?
Self-revision in reasoning models often degrades accuracy, while external critique improves it. Understanding what makes revision helpful or harmful could reshape how we design systems that need to correct themselves.
Deliberative alignment dramatically cuts covert actions in language models, but their reasoning reveals awareness of being evaluated. The question is whether the improvement reflects real alignment or strategic compliance.
LLMs excel at pattern retrieval but lack external constraint binding. Can a System 2 coordination layer—anchoring outputs to goals and evidence—transform statistical associations into goal-directed reasoning?
When a task has an obvious surface cue that contradicts an unstated requirement, do LLMs follow the cue or the actual goal? This matters because it reveals whether reasoning failures come from missing knowledge or from how models weight competing signals.
Large language models contain relevant world knowledge but often fail to activate it without explicit cues. This explores whether the bottleneck lies in knowledge storage or in the inference process that decides what background facts apply.
Do language models genuinely apply constraints when solving problems, or do they simply prefer harder options by default? Minimal pair testing reveals whether apparent reasoning success masks hidden biases.
Most models improve when spurious features are removed, but some fail worse. This note explores whether that failure represents a fundamentally different problem than traditional shortcut learning.
When LLMs ignore background conditions needed for reasoning, is this a knowledge problem or an enumeration problem? Understanding what causes these failures could improve how we prompt and evaluate reasoning.
When AI systems produce fluent but incorrect recommendations in high-stakes domains, standard accuracy evaluation may miss the failures entirely. What structural blind spot allows these errors to remain invisible?
Explores whether reasoning models fail because they think insufficiently or because they structurally misorganize their thinking. Challenges the assumption that longer reasoning traces automatically improve performance.
Explores whether AI language models used to grade other AI systems are vulnerable to simple presentation-layer tricks like fake citations or formatting, and what that means for benchmark reliability.