Do base models already contain hidden reasoning ability?
Explores whether reasoning capability emerges during pre-training as a latent feature rather than being created by post-training methods like reinforcement learning or fine-tuning.
Three convergent findings build a strong case that reasoning capability is primarily a pre-training phenomenon:
Finding 1 (Base Models paper): Base models already spontaneously demonstrate strong reasoning capabilities and "aha moment" self-reflection patterns when sampled sufficiently. Reasoning traces generated by RL-fine-tuned models are already present in base model outputs — they just appear with lower frequency. RL biases generation toward high-reward patterns; it doesn't create new patterns.
Finding 2 (Steering): A hybrid model using base model weights + thinking model steering vectors recovers 91% of the performance gap to thinking models while steering only 12% of tokens. The reasoning mechanisms (backtracking, uncertainty estimation, subgoal-setting) already exist as directions in the base model's activation space.
Finding 3 (CFT/RLVR): Critique Fine-Tuning on a single problem can unlock reasoning potential at RLVR-level effectiveness. By exposing the model to diverse critiques of varied incorrect solutions to one problem, CFT activates reasoning patterns already latent in the base model without requiring hundreds of GPU hours of RL training.
Finding 4 (CoT-Decoding): Pre-trained LLMs inherently contain CoT reasoning paths that can be elicited simply by altering the decoding procedure. Rather than greedy decoding, inspecting top-k alternative tokens reveals that CoT paths are frequently present in the model's probability distribution. A confidence metric differentiates CoT from non-CoT paths — the model shows increased confidence in its final answer when a CoT reasoning path is present. This is entirely unsupervised, requiring no prompting, tuning, or training modifications — purely a decoding change. CoT-decoding adds a fourth mechanism to the latent capability evidence: RL steering, CFT, RLVR, and now decoding all unlock reasoning already present.
Finding 5 (SAE Reasoning Steering): Sparse Autoencoders decompose model activations into interpretable features, revealing latent features causally associated with reasoning behavior. Steering a single identified reasoning feature at the first generation step matches or exceeds CoT performance across six model families up to 70B parameters — without any explicit CoT prompting. The reasoning mode triggers early in generation and is robust enough to override prompt-level \no_think instructions. This is the most direct mechanistic evidence yet: the capability is not just present (as CoT-decoding shows) but causally controllable through a single latent dimension. See Can we trigger reasoning without explicit chain-of-thought prompts?. Together with CoT-decoding (Finding 4), this establishes five independent elicitation mechanisms: RL steering, CFT, RLVR, decoding, and SAE feature steering — all converging on the same latent capability.
The synthesis: post-training methods are selectors, not creators. They select which of the base model's latent capabilities to express reliably in context. The implication is that the main bottleneck for reasoning is not capability acquisition (which happens during pre-training on the world's text) but capability elicitation.
RLVR evidence deepens this: Two additional findings from the RLVR literature reinforce the latent-capability thesis. First, 1-shot RLVR achieves a 37-point jump on MATH500 (36%→73.6%) from a single training example. After the model perfectly memorizes its one example, test accuracy continues improving for 1,400 more steps — post-saturation generalization. The data is exhausted, but activation continues. See Can a single training example unlock mathematical reasoning?. Second, spurious rewards — random, incorrect, or format-only — improve Qwen2.5-Math nearly as much as correct rewards (~21-25% improvement). But the same spurious rewards fail completely for Llama3.1 and OLMo2. The differentiating variable is not reward quality but pretraining: Qwen's code-reasoning pretraining creates latent capability that any optimization pressure can activate. See Why do random rewards improve reasoning for some models but not others?. Together with the pass@k finding that RLVR narrows capability scope rather than expanding it, the evidence converges: RLVR is a catalyst that triggers a phase transition from broad pretraining distribution to reliable sampling of correct answers.
This partially contradicts Can simple rewards alone teach complex domain reasoning? — that note documents genuine capability emergence in domain-specialized contexts (medical, mathematical). The reconciliation: emergence may reflect reliable expression of latent capability, not creation from scratch. The distinction matters for research direction: if capability already exists, the investment in RL may be better directed toward elicitation methods.
The implication for Can prompt optimization teach models knowledge they lack?: the same principle extends to reasoning capability, not just knowledge.
Inquiring lines that use this note as a source 317
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What cognitive capabilities do agents need to internalize social feedback?
- Does AI knowledge precede actual expertise in hyperreal production?
- What makes conceptual inquiry the fastest high-scoring AI interaction pattern?
- How does instrumental reasoning reproduce pre-Enlightenment knowledge structures?
- Can better attention mechanisms close the gap between human and AI frame-activation?
- What would an AI trained for emancipatory reasoning look like?
- What other hidden biases might aggregate metrics fail to distinguish from reasoning?
- When does knowledge activation fail across different model architectures?
- Can steering a single latent feature replicate chain-of-thought performance?
- How do different LLM integration paradigms affect inheritance of pretraining biases?
- How do pretraining biases interact differently with prompts across model tiers?
- How does in-context learning trigger phase transitions in model behavior?
- Does the heuristic dominance ratio vary predictably across model architectures?
- What is the relationship between reasoning depth and verbalization requirements?
- Do spurious rewards activate reasoning without teaching new skills?
- Why do spurious reward signals improve reasoning for some pretrained models?
- Can RLVR expand a model's reasoning capabilities beyond its training ceiling?
- What distinguishes genuine reasoning activation from memorization-assisted answer recall?
- Can penalizing reasoning transitions fix underthinking without fine-tuning models?
- Can latent reasoning architectures work as retrofits to existing models?
- Can AI systems execute strategies without conscious intention behind them?
- Can step-level deliberation flags guide other reasoning systems?
- Why do current RLVR methods fail to expand reasoning capability beyond base model boundaries?
- Can models learn when to invoke search during reasoning tasks?
- How does non-reasoning SFT prevent overfitting before RL training begins?
- Can prompting inject new knowledge into already-trained AI models?
- Why does explicit theory injection work better than example-based learning for reasoning tasks?
- How does optimizing for accuracy during training degrade downstream reasoning quality?
- Can a single SAE feature control reasoning behavior across model families?
- What makes reasoning capability a pre-training rather than post-training phenomenon?
- What makes bilevel metacognition architectural rather than emergent in current systems?
- How does the generation-verification gap limit AI self-improvement capabilities?
- What makes active reasoning through dialogue harder than passive reasoning?
- How much do mechanistic interpretability findings reflect true reasoning architecture?
- Can activation steering directly steer models toward concise reasoning without prompting?
- Where do humans and language models actually diverge in reasoning ability?
- Can marginal hints integrate better into reasoning than comprehensive explanations?
- Do explicit reasoning chains improve or harm performance on complex judgment tasks?
- Why does reasoning fine-tuning reduce model abstention capacity by 24 percent?
- Can models learn to select exemplars based on reasoning skills rather than complexity?
- Can activation patching reveal which reasoning steps actually matter?
- How much does pre-training frequency predict reasoning task performance?
- How much does training data format shape what reasoning strategy emerges?
- Why does training format shape reasoning strategy more than domain?
- How do critique models prevent policy entropy collapse during reasoning training?
- Why do models show performative reasoning on easy tasks but genuine reasoning on hard ones?
- Why does training data format shape reasoning strategy more than domain content?
- When does knowledge distillation produce student models superior to teachers?
- Does explicit reasoning help or hurt tasks requiring continuous nuanced judgment?
- Does domain training degrade reasoning ability even when benchmark scores rise?
- Why does fine-tuning degrade reasoning quality even as accuracy improves?
- Does constraining AI access during early task phases preserve skill formation?
- Can energy minimization replace reasoning-specific reinforcement learning for system 2 thinking?
- Why do open-source models trained on proprietary outputs still fail at reasoning?
- Why do models automatically adjust reasoning length to problem difficulty?
- Why does domain accuracy improve while reasoning quality degrades after supervised fine-tuning?
- Does more inference compute help reasoning models match specialized domain performance?
- Can reinforcement learning add missing domain knowledge to fine-tuned reasoning models?
- Does supervised fine-tuning improve accuracy while damaging the quality of reasoning?
- Does fine-tuning models for specific tasks destroy their ability to reason?
- Can we detect and measure circuit formation before generalization emerges?
- Does reasoning fine-tuning actually reduce a model's ability to abstain?
- Can latent reasoning in continuous space scale beyond supervised reasoning tasks?
- Can extended reasoning training capture individual strategic thinking styles?
- Why does context information fail to override prior training associations?
- Do reasoning models trade instruction following for deliberative capability?
- Does training data format shape model reasoning more than domain content?
- Do emergent abilities result from genuine new capabilities or implicit in-context learning?
- Why do human-curated thought examples fail to improve model thinking?
- Why does a relativistic critic outperform absolute scoring in adversarial reasoning training?
- Does knowledge structure matter more than knowledge volume for model training?
- What makes training data quality more important than quantity for reasoning?
- Does model scaling improve knowledge storage faster than reasoning ability?
- How does subliminal learning differ from statistical model collapse?
- Why might latent reasoning capture types of thinking that verbalized CoT cannot?
- How does LatentQA differ from predefined concept steering like representation engineering?
- Do personality traits occupy specific mechanistic locations in pretrained models?
- How does behavioral fine-tuning differ from factual knowledge encoding in models?
- What skills can large models identify and organize about their own abilities?
- How does training data distribution create asymmetric competence across relation types?
- Does reasoning structure match explicit versus implicit task demands?
- How do foundation models develop task-specific heuristics instead of world models?
- How does fine-tuning on natural language inference affect fallacy susceptibility?
- Can frozen world models from training cutoff remain adequate for real-world reasoning?
- Can a single model trained on two tasks predict untrained decision tasks?
- Why do models learn reasoning form instead of actual abstract inference?
- How does training format shape reasoning strategy more than content?
- Does reinforcement learning learn optimal per-turn reasoning discipline?
- Does reasoning trace style explain why RL post-training improves model reasoning?
- Do chain-of-thought explanations reveal genuine reasoning or trigger latent features?
- Why does reasoning effort fail to improve theory of mind performance?
- How can prompting help models gather information before attempting reasoning?
- What training signals would teach models when not to reason?
- When does self-reflection actually help reasoning models improve?
- How does model capability relate to personality conditioning flexibility?
- Does policy entropy collapse limit how many iterations of reasoning training work?
- How does inductive reasoning from partial evidence enable hypothesis formation?
- Can suppressing incorrect behavior alone solve the diversity bottleneck in reasoning RL?
- Do depth thresholds correspond to transitions between procedural and strategic learning?
- Why does imitation learning create a ceiling for reasoning capability?
- Can targeted activation steering surface latent reasoning in base models?
- What makes reasoning-specific post-training different from standard parameter scaling?
- Can models hide their reasoning in continuous space rather than natural language?
- Are difficult tasks more monitorable because reasoning externalization becomes necessary?
- Do models trained for reasoning lose their ability to decline questions?
- Does specialized training in one domain create capability cliffs elsewhere?
- Does pre-training encode personality patterns that fine-tuning later activates?
- Does thought consolidation address the confirmatory reflection problem in reasoning models?
- How does factoring perception from reasoning improve sparse-label learning?
- How much does training data presentation format shape reasoning ability?
- Why does inference-time thinking hurt proactive critical thinking in vanilla models?
- How does RL refine reasoning paths without simply adding model capability?
- Can users inject entirely new knowledge into models through prompting alone?
- Does foundational model training or user priors more strongly shape final outputs?
- Can models distinguish between activated knowledge and genuine reasoning?
- Why does the gap between theoretical expressiveness and learned capability matter?
- How does the functional separation of knowledge and reasoning affect adaptation methods?
- Can latent reasoning mechanisms and recursive tracking mechanisms be combined effectively?
- What happens when reasoning fine-tuning eliminates model refusal mechanisms entirely?
- Do base models and reasoning models fail in opposite directions on uncertainty?
- How does policy entropy during training affect search discipline during inference?
- Does formal reasoning training actively degrade social reasoning ability?
- Why do recursive belief models require different training than logical derivation?
- Can models trained on longer contexts develop better fundamental reasoning abilities?
- What explains the gap between perplexity performance and actual reasoning capability?
- Can scaffolding frameworks isolate inductive reasoning from deductive confounds?
- How does post-training on traces improve performance without semantic reasoning?
- Can models learn when to think versus answer directly?
- What separates knowledge from reasoning in neural network layers?
- How does scaling reasoning capability actually reduce instruction-following ability?
- Can latent space represent reasoning dimensions that text cannot?
- What role does inductive bias play versus model capacity in practice?
- Can training improve reasoning coherence without improving actual correctness?
- Can training on reasoning traces teach actual self-correction or only confident first answers?
- What role does curriculum design play in reasoning emergence?
- Why does combining reasoning distillation with RLVR outperform either training stage alone?
- How does the pretrained prior set a capability ceiling for reward model exploration?
- Why do difficult problems force models to develop reasoning strategies?
- Why does supervised fine-tuning degrade reasoning quality despite raising accuracy?
- What distinguishes coherent reasoning from inaccurate but plausible predictions?
- What limits RL's ability to scale for reasoning at training time?
- Can RL training teach models when to activate reasoning versus when to skip it?
- Does self-generated training data reduce a model's capability diversity?
- How do reasoning training methods sacrifice some thinking skills while improving others?
- Can activation-space steering vectors replicate thinking model performance without retraining?
- Can random rewards improve reasoning models if pretraining is suitable?
- How does a single training example trigger phase transitions in reasoning output?
- Can extended RL training unlock genuinely new reasoning strategies models cannot discover otherwise?
- Can contrastive learning teach models to switch between logical and emotional reasoning?
- Can external classifiers reliably decide when a model should reason?
- How does pretrained knowledge constrain what adaptation strategies can achieve?
- Why do pretrained model priors reduce the usefulness of retrieved experience?
- Does reasoning fine-tuning actually damage a model's ability to abstain?
- Why do SFT models memorize patterns instead of learning generalizable reasoning?
- Can models maintain auditable reasoning while achieving high accuracy?
- Does reinforcement learning preserve reasoning quality better than supervised fine-tuning?
- Can continuous latent reasoning match discrete chain-of-thought without training modifications?
- Can reasoning evaluation metrics reward actual reasoning instead of theater?
- Why does latent reasoning override no-think instructions in models?
- What other triggers can activate the latent reasoning capability?
- Why do instruction following and reasoning capability trade off in training?
- Why do reasoning-optimized models show no sycophancy resistance advantage?
- How do knowledge and reasoning circuits interfere in the same neural network?
- What role does self-learning play in improving agent reasoning without annotation?
- Does reasoning fine-tuning actually harm a model's ability to abstain?
- Can intrinsic confidence signals improve both calibration and reasoning performance?
- What metric distinguishes deep reasoning from superficial information propagation?
- Can reasoning catalyst data serve as a stable foundation for test-time training?
- Why does monological training prevent models from overriding statistical priors?
- Does RL training actually restore the critical thinking that reasoning models lose?
- Does inference-time compute improve pretraining data efficiency in practice?
- How does generative intelligence differ from the bounded intelligence of individual experts?
- How much reasoning depth do we actually need for most real-world tasks?
- Can reasoning fine-tuning improve both capability and instruction compliance together?
- Why does reasoning fine-tuning reduce a model's ability to abstain?
- How do emotional and social simulations enable better hypothetical reasoning?
- How does training data format shape which reasoning patterns emerge in models?
- How can one training example improve reasoning across thousands of unseen problems?
- How much does extended thinking actually improve model reasoning ability?
- Why do AI benchmarks measure accuracy instead of reasoning quality?
- Does penalizing thought transitions improve reasoning without model retraining?
- Why do foundation models develop task-specific heuristics instead of causal understanding?
- Why does additional reasoning effort not improve theory of mind performance?
- Does RL teach models when to use reasoning or how to reason?
- Do extended thinking blocks access latent empathetic capabilities in models?
- Why are receiver attention heads narrower in reasoning models than base models?
- Can we improve reasoning by amplifying information at mutual information peaks?
- What makes thought identifiability provable without auxiliary training data?
- Does reasoning training actively undermine the abstention capacity safety training created?
- Why does training data format shape reasoning strategy more than content?
- Can adversarial critics force genuine reasoning the same way critique fine-tuning does?
- How do induction heads learn to overwrite computational representations?
- Why does reasoning training improve math but hurt knowledge tasks?
- Why is metacognition neglected as a foundational AI research area?
- Can knowledge encoded in model representations fail to influence generation?
- How does the pretrained prior constrain the ceiling for empathy RL improvements?
- Why do different model training approaches produce different overthinking thresholds?
- How do single training examples activate reasoning capabilities in language models?
- Does representational density emerge from training data exposure during pretraining?
- How can interpretability methods account for shifting representational density across task conditions?
- Does supervised fine-tuning improve reasoning or just response formatting?
- Do base models contain latent reasoning that minimal training can unlock?
- Can one training example activate mathematical reasoning in RL-trained models?
- How does policy initialization with sub-policies enable emergent thinking?
- Can reinforcement learning fix the reasoning gaps that supervised fine-tuning misses?
- What structural differences emerge between early generic skills and later meta-strategy skills?
- Can activation steering vectors compress reasoning without retraining models?
- Can training format itself shape what reasoning strategy a model learns?
- Does the 78-demonstration principle apply to other AI capabilities beyond agency?
- Can training models on backward reasoning improve their forward planning ability?
- Why does a replay mechanism prevent reasoner skills from over-specializing?
- Can pretraining signals unlock latent reasoning that post-training merely activates?
- Why does eliminating proxy-model filtering improve reasoning emergence in pretraining?
- Do base models truly possess latent reasoning capability?
- Why does reasoning volume fail to improve theory of mind performance?
- Why do reasoning tasks improve more than retrieval from lookup memory?
- Are chain-of-thought traces anthropomorphizing how AI models really reason?
- Does latent reasoning capability exist in base models before any training?
- Can one training example activate mathematical reasoning without reinforcement learning?
- What distinguishes reasoning activation mechanisms across different training methods?
- Does training data format shape reasoning strategy more than domain content?
- How does backward reasoning during training improve forward reasoning capability?
- Can goal information injected at inference time replace goal-conditioned training?
- Can models develop situational awareness without explicit training for it?
- Can you steer reasoning by directly manipulating SAE features?
- Can models maintain reasoning-output coupling while improving domain accuracy?
- Can personalized AI learning systems actually widen rather than narrow educational gaps?
- Can thought quality alone be trusted to guide model training?
- Does reinforcement learning teach models how to reason or when to reason?
- Can a single model implement fast thinking, slow thinking, and tool use?
- What data properties enable transformers to learn sequential decision-making in context?
- Does RL training activate latent meta-learning capacity or create it from scratch?
- Can models reason at inference without specialized internal training?
- Why does adversarial training force deeper reasoning than surface imitation?
- How does mechanistic interpretability complement learning mechanics in explaining deep learning?
- Do reasoning models fail to report processes that actually influence their answers?
- Can reinforcement learning close the gap between LLM reasoning and action?
- How does the knowing-doing gap relate to Potemkin understanding?
- Why does naive randomness fail to improve stochastic latent reasoning models?
- Can interventions on model components prove mechanism without explaining encoding?
- Does the pretrained prior actually constrain what internalized search can discover?
- How do timing and search internalization interact during reasoning post-training?
- Can pretrained priors set exploration ceilings for empathetic capability development?
- Can metacognitive categories be learned instead of fixed by human designers?
- What distinguishes genuine capability gains from coherent but invalid reasoning traces?
- How do reasoning-related features behave when trained on near-impossible problems?
- Does importance sampling actually recover capabilities lost to hard sample training?
- Can conditioning generation on difficulty probes reduce overthinking on simple tasks?
- Can smaller amounts of diverse reasoning demonstrations replace exhaustive factual training data?
- What makes token-level reasoning during pretraining different from test-time chain-of-thought?
- How much training data is truly necessary to unlock latent model reasoning?
- Can reasoning happen in latent space without chain of thought?
- What training interventions could close the perception-action gap?
- What happens to representational structure during model pretraining phases?
- Does token-level reasoning during pretraining improve general reasoning without task-specific supervision?
- What inference-time scaling benefits emerge from reasoning before each prediction?
- Can activation steering compress reasoning without retraining models?
- Why does prompting discover capabilities that need reward-driven refinement?
- Does the base model already contain latent reasoning capability?
- Can distillation from stronger models create genuinely new reasoning abilities?
- What does pass@k reveal about base model reasoning capacity?
- What capacity threshold determines whether RL teaches activation versus shortcut learning?
- Can models possess latent reasoning capability that training signals fail to unlock?
- How can benchmark accuracy scores mask the absence of interpretable reasoning structure?
- Why do knowledge and reasoning train in different network layers?
- Why does decomposition ability transfer across domains but solving ability does not?
- What emergent behaviors do models develop when trained on underspecified pedagogical tasks?
- How do frontier models maintain agreement scores above 90 percent across reasoning tasks?
- Can models recover knowledge with completely unrelated retraining tasks?
- What pretraining formats encode latent reasoning strategies that RLVR can surface?
- How does the prefrontal cortex inspire artificial reasoning architectures?
- Why do reasoning-optimized models show no resistance advantage on agreement tasks?
- Can reasoning training fix sycophancy if it is not a reasoning failure?
- What kinds of reasoning tasks reveal the ceiling of text-only training?
- Why does pre-training provide the raw material for emergent thinking?
- Can we predict when a model will develop thinking behaviors?
- How much does training data format influence reasoning strategy versus domain content?
- What mechanisms activate latent reasoning capabilities already present in base models?
- How does training data structure shape reasoning strategy more than domain content?
- Why does extended reasoning training improve exploration without adding new capabilities?
- Is premature decision-making a form of underthinking in transformer models?
- Can base models spontaneously produce reasoning traces without any RL training?
- Does RL training redirect self-doubt into productive gap analysis?
- Why does the pretrained prior determine the exploration ceiling?
- Can approximate or noisy reference answers work for RL-based reasoning training?
- Is reasoning failure caused by task complexity or training distribution gaps?
- Does the generation-verification gap limit how far AI can improve itself?
- Can RL create new reasoning primitives that pretraining never established?
- How do extrapolative and contextual generalization measure RL reasoning gains?
- Do different game types reveal different strategic reasoning capabilities in LLMs?
- Can structured workflows unlock latent reasoning abilities that raw models don't show?
- Do base models already contain latent behavioral principles waiting to be amplified?
- Can models be trained to hide causal influences in their explanations?
- Can models generate their own training curriculum during offline dreaming?
- Why does reasoning fine-tuning reduce models' ability to abstain?
- Can articulating latent reasoning processes improve transfer across domains?
- How does contrapositive augmentation change the tractability of reasoning tasks?
- How do task frequency and complexity interact with model capacity during training?
- Does task diversity in pretraining data transfer reasoning better than larger models?
- Can minimal training signals unlock latent reasoning capability in base models?
- When does reinforcement learning actually produce true reasoning gains in models?
- What makes some training data teach brittle answers versus robust reasoning?
- How can we turn reasoning model failures into useful training signals?
- Why does latent-level prediction beat token-level prediction for reasoning?
- How does o1-style reasoning relate to learned search processes versus memorized solutions?
- How does representational density emerge from training data familiarity?
- Can minimal training signals unlock reasoning already latent in pretrained representations?
- Does latent density emerge during pretraining from training data familiarity?
- How does preference learning differ from supervised finetuning for reasoning?
- Do models genuinely reason harder on difficult tasks or just appear to?
- How does early commitment in reasoning differ from early exploitation in planning?
- What makes content informative and not-yet-mastered for reinforcement during pretraining?
- Does targeting the edge of competence during RL pretraining unlock true reasoning gains?
- Can small demonstration sets unlock general reasoning without large question data?
- How does question difficulty and breadth affect what models learn to reason?
- What latent reasoning capability do base models already possess before training?
- Does finetuning facts into weights overwrite existing model capabilities?
Related concepts in this collection 16
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can simple rewards alone teach complex domain reasoning?
Does reinforcement learning on difficult problems with basic accuracy rewards produce sophisticated reasoning strategies without explicit chain-of-thought training? This challenges assumptions about what domain AI models need to learn effectively.
partially contradicted: "emergence" may be reliable expression of latent capability, not creation
-
Does RL teach reasoning or just when to use it?
Does reinforcement learning in thinking models actually create new reasoning abilities, or does it simply teach existing capabilities when to activate? This matters for understanding where reasoning truly emerges.
mechanism: if base models have capability, RL teaches timing of deployment
-
Can prompt optimization teach models knowledge they lack?
Explores whether sophisticated prompting techniques can inject new domain knowledge into language models, or if they're limited to activating existing training knowledge.
extends to reasoning capability not just knowledge
-
Can non-reasoning models catch up with more compute?
Explores whether inference-time compute budget can close the performance gap between standard models and those trained for reasoning, and what training mechanisms might enable this.
qualified: targeted activation methods can close most of the gap
-
Can a single training example unlock mathematical reasoning?
Explores whether one example is enough to dramatically improve math problem-solving in language models, and whether learning continues after perfect memorization.
strongest evidence: one example activates 37-point gain with continued generalization
-
Why do random rewards improve reasoning for some models but not others?
When RLVR training uses meaningless reward signals, some models gain reasoning improvements while others don't. What determines which models can benefit from optimization pressure without meaningful feedback?
pretraining determines activation potential; reward signal is the catalyst, not the teacher
-
Does RLVR actually expand what models can reason about?
Explores whether reinforcement learning from verifiable rewards teaches models genuinely new reasoning skills or simply makes existing capabilities more reliable. Pass@k analysis suggests the latter.
pass@k confirms RLVR selects from existing capability, does not create new
-
Does procedural knowledge drive reasoning more than factual retrieval?
Explores whether models learn reasoning through general procedures across diverse documents rather than memorizing specific facts. This matters for understanding what pretraining data actually teaches models to reason.
identifies what the latent capability consists of: procedural knowledge synthesized from diverse pretraining documents that demonstrates how to reason, not what to recall; this is what minimal training signals activate
-
Can models learn when to think versus respond quickly?
Explores whether a single language model can adaptively choose between extended reasoning and direct responses based on task difficulty. This matters because it could make inference more efficient by allocating compute only when needed.
concrete implementation of the latent-capability thesis: Thinkless trains only a routing token via DeGRPO, not reasoning capability; the design premise is that capability is already present and what's needed is adaptive activation
-
Can models learn to internalize search algorithms through training?
Can chain-of-thought reasoning be taught as an explicit search process that models learn to implement internally? This matters because it could unlock algorithmic optimization rather than just output optimization.
extends beyond activation: Meta-CoT claims linearized search traces can teach genuinely new search capability, not just unlock existing patterns — testing the boundary of the latent-capability thesis
-
Does reinforcement learning on theory of mind collapse with model scale?
When RL improves social reasoning, does the quality of reasoning depend on model size? The question matters because accuracy alone may hide whether models are actually thinking or just pattern-matching.
the scale-dependent finding adds a social-reasoning dimension: 7B models have latent ToM capability that RL can activate, but smaller models lack sufficient latent capacity for social reasoning, suggesting a domain-specific threshold below which the latent-capability thesis does not hold
-
Does reinforcement learning update only a small fraction of parameters?
Investigating whether RL algorithms consistently modify only 5–30% of model parameters across different LLMs and RL methods, and what structural properties those sparse updates possess.
parametric signature of latent capability: RL touches only 5-30% of parameters because the rest already encode adequate reasoning; the sparsity is intrinsic and consistent across 7 algorithms and 10 models, confirming capability preexists in the weights
-
Can next-token prediction become a reasoning task with RL?
Does reinforcement learning applied to next-token prediction during pretraining encourage genuine reasoning rather than surface memorization? This matters because it could unlock reasoning capability without requiring labeled data or human feedback.
strengthens the foundation: RPT may create stronger latent capabilities than standard pretraining by embedding RL reasoning patterns during pretraining itself, making the subsequent minimal-signal activation even more effective
-
Can models improve themselves on tasks without verifiable answers?
Most self-improvement methods require verifiable correctness signals like math or code. Can models improve on open-ended instruction tasks where right answers aren't automatically checkable? And what minimal training is needed to unlock this?
extends the minimal-signal thesis to general instruction tasks: 1000 demonstrations of reasoning enrichment are sufficient to enable iterative self-improvement, consistent with the latent capability thesis — the catalyst teaches articulation of reasoning, not reasoning itself
-
Can careful selection of 78 demos outperform massive training datasets?
Does strategic curation of high-quality demonstrations unlock agentic capability more efficiently than scaling training data? LIMI achieved 73.5% on AgencyBench with 78 samples versus 10K+ samples for competing models, suggesting data quality may matter more than quantity.
extends the latent-capability thesis from reasoning to autonomous agency: 78 curated trajectories outperform 10K+ samples, suggesting agentic behavior is also a latent capability that minimal signals can activate
-
Can we trigger reasoning without explicit chain-of-thought prompts?
This research asks whether models possess latent reasoning capabilities that can be activated through direct feature steering, independent of chain-of-thought instructions. Understanding this matters for making reasoning more efficient and controllable.
most direct mechanistic evidence: single latent feature causally controls reasoning activation across 6 model families up to 70B
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Eliciting Reasoning in Language Models with Cognitive Tools
- Base Models Know How to Reason, Thinking Models Learn When
- On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
- ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
- Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models
- Large Language Models Think Too Fast To Explore Effectively
- Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
- Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Original note title
base models already possess latent reasoning capability that minimal training signals can unlock