Can we trigger reasoning without explicit chain-of-thought prompts?

This research asks whether models possess latent reasoning capabilities that can be activated through direct feature steering, independent of chain-of-thought instructions. Understanding this matters for making reasoning more efficient and controllable.

Synthesis note · 2026-04-20 · sourced from Cognitive Models Latent

Using Sparse Autoencoders to decompose model activations into interpretable features, a two-stage pipeline identifies latent features causally associated with reasoning behavior. First, SAEs extract sparse features from activations comparing CoT vs non-CoT prompting conditions. Second, targeted steering interventions modulate candidate features and measure downstream reasoning performance.

The central result: steering a single reasoning-related latent feature at the first generation step substantially improves accuracy without explicit CoT prompting. For large models, latent steering achieves performance comparable to standard CoT while producing more efficient outputs — fewer tokens, same accuracy.

Three properties of this reasoning mode are striking:

Early triggering. The reasoning-oriented internal state is triggered early in generation, not built up through sequential token production. This contrasts with the H2 assumption that reasoning emerges through the step-by-step construction of a chain.

Override robustness. The latent reasoning mode can override prompt-level instructions that discourage explicit reasoning — including the \no_think instruction used in Qwen models. The internal state takes precedence over surface directives, suggesting the latent mechanism operates at a deeper level than prompt compliance.

Cross-model generality. The finding replicates across six model families up to 70B parameters, suggesting this is not an architecture-specific artifact but a general property of how large language models organize reasoning capability.

The implication is that CoT prompting is one effective but not unique way of activating an underlying reasoning mechanism. Other triggers include: altered decoding procedures (CoT-decoding from Do base models already contain hidden reasoning ability?), soft continuous representations (from Can we explore multiple reasoning paths without committing to one token?), and now direct feature steering. The multiplicity of triggers, all converging on the same capability, is the strongest evidence that the capability is latent and the triggers are interchangeable surface-level activators.

This extends the repertoire of steerable behavioral dimensions from Can we steer reasoning toward brevity without retraining? (reasoning verbosity), Can we track and steer personality shifts during model finetuning? (personality), and Can high-level concepts replace circuit-level analysis in AI? (truthfulness, honesty, morality) to include reasoning activation itself — arguably the most consequential dimension yet.

Inquiring lines that use this note as a source 50

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 8

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

19 direct connections · 187 in 2-hop network ·dense cluster Open in graph ↗

Can we trigger reasoning without explicit chain-… Do base models already contain hidden reasoning ab… Can we steer reasoning toward brevity without retr… Can high-level concepts replace circuit-level anal… Can we explore multiple reasoning paths without co… Where does LLM reasoning actually happen during ge… Can latent thought vectors scale language models b… Does RL teach reasoning or just when to use it? Do language models actually use their encoded know…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do base models already contain hidden reasoning ability? Explores whether reasoning capability emerges during pre-training as a latent feature rather than being created by post-training methods like reinforcement learning or fine-tuning.
converges: CoT-decoding, CFT, RLVR, and now SAE steering all unlock latent reasoning
Can we steer reasoning toward brevity without retraining? This explores whether model reasoning style occupies learnable geometric directions in activation space, and whether we can shift toward concise thinking by steering through that space without expensive retraining.
extends steerable dimensions from verbosity to reasoning activation
Can high-level concepts replace circuit-level analysis in AI? Instead of reverse-engineering individual circuits, can we study AI reasoning by treating concepts as directions in activation space? This matters because circuit analysis hits practical limits at scale.
SAE steering for reasoning adds a new dimension to the RepE paradigm
Can we explore multiple reasoning paths without committing to one token? Standard language models pick one token at each step, collapsing uncertainty and forcing single reasoning trajectories. Could preserving the full probability distribution across token embeddings enable implicit parallel exploration instead?
another non-CoT trigger for latent reasoning
Where does LLM reasoning actually happen during generation? Does multi-step reasoning emerge from visible chain-of-thought text, hidden layer dynamics, or simply more computation? Three competing hypotheses make different predictions and can be empirically tested.
provides causal evidence for H1
Can latent thought vectors scale language models beyond parameters? Explores whether explicit latent thought vectors with dual-rate learning create new scaling dimensions independent of model size. This matters because it suggests alternatives to simply building larger models.
LTMs make latent thought vectors explicit architectural components; SAE steering shows reasoning features are already implicit in standard architectures
Does RL teach reasoning or just when to use it? Does reinforcement learning in thinking models actually create new reasoning abilities, or does it simply teach existing capabilities when to activate? This matters for understanding where reasoning truly emerges.
converges: RL teaches when to activate reasoning; SAE steering shows the reasoning mechanism is a single activatable feature
Do language models actually use their encoded knowledge? Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
SAE steering closes this gap for reasoning: the identified feature IS causally active, unlike many encoded-but-unused representations

Can we trigger reasoning without explicit chain-of-thought prompts?

Related concepts in this collection 8

Related papers in this collection 8

Search by related questions 4