Can minimal training signals unlock reasoning already latent in pretrained representations?

This explores whether the reasoning ability LLMs show is something they already have from pretraining — waiting to be switched on by a small nudge — rather than a new skill that heavy training has to build from scratch. The corpus comes down surprisingly hard on the "already there" side. The clearest statement is that five independent mechanisms — RL steering, critique fine-tuning, decoding changes, sparse-autoencoder feature steering, and reinforcement learning with verifiable rewards — all elicit reasoning that already lives in base-model activations, which means post-training *selects* reasoning rather than *creating* it Do base models already contain hidden reasoning ability?. If that framing is right, the bottleneck was never capability acquisition; it was elicitation.

The most striking evidence for how *minimal* the signal can be: a single feature, identified inside the model with a sparse autoencoder, can be steered to match or beat full chain-of-thought prompting across six model families — and it fires early in generation, even overriding surface instructions Can we trigger reasoning without explicit chain-of-thought prompts?. Reasoning verbosity turns out to be a similarly linear, steerable direction: one vector pulled from just 50 paired examples cuts chain-of-thought length by two-thirds with no retraining Can we steer reasoning toward brevity without retraining?. And you don't even need weight changes — four modular "cognitive tools" implemented as sandboxed calls lifted GPT-4.1 on a hard math benchmark from 27% to 43% with zero RL, by enforcing the operation isolation that pure prompting can't guarantee Can modular cognitive tools unlock reasoning without training?. The recurring pattern is that the trigger is small and the capability is pre-existing.

But "latent and waiting" raises an obvious follow-up: latent how, and put there by what? Two notes argue the latency is itself a product of pretraining choices. Reasoning generalization rides on broad, transferable *procedural* knowledge spread across many documents — unlike factual recall, which depends on narrow memorization — so the raw material for reasoning is laid down diffusely during pretraining Does procedural knowledge drive reasoning more than factual retrieval?. You can also build the reasoning *into* pretraining directly: treating chain-of-thought as an exploratory action rewarded by information gain lifts math and science benchmarks ~19% Can chain-of-thought reasoning be learned during pretraining itself?, and looped architectures that iterate in latent space get 2–3× efficiency without extra capacity Can reasoning happen in latent space during pretraining?. Energy-based transformers push this furthest — reaching System-2-style deliberation from unsupervised learning alone, no domain-specific scaffolding Can energy minimization unlock reasoning without domain-specific training?.

Here's the unsettling part, and the thing you might not have known you wanted to know: if a tiny signal unlocks reasoning, maybe what's being unlocked isn't "reasoning" in the strong sense at all. Models trained on *deliberately corrupted* reasoning traces perform about as well as those trained on correct ones — sometimes generalizing better — which suggests the traces work as computational scaffolding, not as meaningful logic Do reasoning traces need to be semantically correct?. Chain-of-draft reaches full accuracy at 7.6% of the tokens, because most of the words were style and documentation, not computation Can minimal reasoning chains match full explanations?. And when semantic content is stripped from a task, performance collapses even with correct rules in hand — LLMs reason through learned associations, not symbolic manipulation Do large language models reason symbolically or semantically?, reproducing familiar schemata that degrade predictably under distribution shift Does chain-of-thought reasoning reveal genuine inference or pattern matching?. So the answer is yes — minimal signals reliably unlock something latent — but the corpus quietly reframes the question: the "reasoning" you elicit so cheaply may be a pattern already compiled into the weights, which is exactly why so little is needed to switch it on.

Sources 12 notes

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Can we trigger reasoning without explicit chain-of-thought prompts?

SAE-identified reasoning features can be directly steered to match or exceed chain-of-thought performance across six model families. This reasoning mode activates early in generation and overrides surface-level instructions, suggesting latent reasoning is a fundamental capability independent of explicit prompting.

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

Can modular cognitive tools unlock reasoning without training?

Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Can chain-of-thought reasoning be learned during pretraining itself?

RLP treats CoT as exploratory action during pretraining, using log-likelihood improvement as verifier-free reward. Applied to Qwen3-1.7B and Nemotron-Nano-12B, the method improves math and science benchmarks substantially, suggesting reasoning can be planted earlier in training.

Can reasoning happen in latent space during pretraining?

Ouro models achieve 2–3× efficiency gains by performing iterative reasoning in latent space during pretraining, not through extra capacity. Their intermediate predictions align faithfully with final outputs, making latent traces more honest than explicit chain-of-thought reasoning.

Can energy minimization unlock reasoning without domain-specific training?

Energy-Based Transformers assign energy values to input-prediction pairs and use gradient descent minimization for inference, yielding 35% higher training scaling rates and 29% more inference-compute gains than Transformer++, while generalizing better on out-of-distribution data without domain-specific scaffolding.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

Can minimal reasoning chains match full explanations?

Chain of Draft achieves equivalent accuracy to standard chain-of-thought on arithmetic, symbolic, and commonsense tasks while using only 7.6% of tokens. The 92.4% of removed tokens served style and documentation, not computation.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question remains open: Can minimal training signals unlock reasoning already latent in pretrained representations? Separate durable from perishable constraints.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat all as candidates for re-testing:
• A single sparse-autoencoder feature can match full chain-of-thought performance across six model families with zero retraining (~2025).
• One vector from 50 paired examples cuts chain-of-thought length by two-thirds (~2025).
• Four modular cognitive tools lifted GPT-4.1 math from 27% to 43% with zero RL (~2025).
• Models trained on deliberately corrupted reasoning traces perform comparably to correct ones, suggesting traces are computational scaffolding, not meaningful logic (~2025).
• Chain-of-draft reaches full accuracy at 7.6% token cost; LLMs reason via learned associations and semantic patterns, not symbolic manipulation (~2023–2025).

Anchor papers (verify; mind their dates):
• arXiv:2305.14825 (2023-05) – In-Context Semantic Reasoners
• arXiv:2411.12580 (2024-11) – Procedural Knowledge in Pretraining
• arXiv:2507.04742 (2025-07) – Activation Steering for CoT Compression
• arXiv:2601.08058 (2026-01) – Latent Computational Mode

Your task:
(1) RE-TEST EACH CONSTRAINT. For the five mechanisms (steering, critique, decoding, SAE, RL), sparse-feature elicitation, verbosity steering, and cognitive tools: has model scaling, new inference methods, or post-training breakthroughs since mid-2025 made ANY of these findings obsolete or redefined the capacity they measure? Plainly state which constraints still hold and which appear relaxed.
(2) Surface the strongest work from the last ~6 months that either contradicts the "latent and waiting" framing or redefines what "reasoning" means in this context.
(3) Propose 2 research questions that assume the regime may have shifted: e.g., if minimal signals work, do they scale to multi-step planning or only task-specific shortcuts? If corrupted traces work, what do we now understand about why?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can minimal training signals unlock reasoning already latent in pretrained representations?

Sources 12 notes

Next inquiring lines