Why might latent reasoning capture types of thinking that verbalized CoT cannot?
This explores whether reasoning that happens inside a model's hidden states — rather than spelled out in words — might do something verbalized chain-of-thought can't, and why the corpus suggests verbalization itself may be the limiting factor.
This explores whether reasoning that happens silently inside a model's hidden states can capture kinds of thinking that written-out chain-of-thought (CoT) leaves out. The short version the corpus points to: verbalization may be more of a training habit than a requirement for reasoning — and forcing thought through words can actively distort it.
The strongest clue is that reasoning seems to live in the model before any words are produced. Several lines of work show base models already carry latent reasoning ability that minimal training merely *unlocks* rather than creates Do base models already contain hidden reasoning ability?, and that steering a single internal feature can match or beat full CoT prompting — a mode that activates early in generation and even overrides surface instructions Can we trigger reasoning without explicit chain-of-thought prompts?. If you can trigger the reasoning by nudging an internal representation, the spoken trace starts to look like a readout, not the engine. That's made concrete by architectures that scale test-time compute purely through hidden-state iteration — depth-recurrent models, Coconut, Heima — suggesting "thinking in tokens" is a learned artifact rather than the substrate of thought Can models reason without generating visible thinking tokens?.
The flip side is the evidence that verbalized CoT is a *constrained* channel. A pile of critiques shows CoT mostly reproduces familiar reasoning *forms* learned in training rather than performing genuine inference — which is why it degrades predictably off-distribution Does chain-of-thought reasoning reveal genuine inference or pattern matching? Does chain-of-thought reasoning actually generalize beyond training data?, why logically *invalid* CoT exemplars perform nearly as well as valid ones Does logical validity actually drive chain-of-thought gains?, and why the gains decompose into output probability and memorization with genuine step-by-step reasoning accumulating error along the way What three separate factors drive chain-of-thought performance? Why does chain-of-thought reasoning fail in predictable ways?. Latent reasoning sidesteps two of these traps: it isn't forced to imitate a surface schema, and it doesn't pay the per-token error tax of a long written chain.
There's also a bottleneck argument. Some tasks aren't bottlenecked on verbalization at all — verbose CoT actually *hurts* fine-grained multimodal perception, because the real constraint is visual attention allocation, and optimizing text tokens trains the wrong target Does verbose chain-of-thought actually help multimodal perception tasks?. More generally, reasoning quality is shaped by what training rewards: the same internal "thinking" mechanism can be counterproductive self-doubt or productive analysis depending on how it's trained Does extended thinking help or hurt model reasoning?. Whenever the useful computation is non-verbal or fine-grained, squeezing it through words is lossy — exactly the gap latent reasoning could fill.
The honest caveat the corpus leaves you with: "capturing more" cuts against interpretability. The same critiques note CoT optimizes against being readable Why does chain-of-thought reasoning fail in predictable ways?, and reasoning already degrades sharply just from longer inputs well below context limits Does reasoning ability actually degrade with longer inputs?. So latent reasoning's advantage — not being shackled to a verbal trace — is also its cost: a richer internal computation we can no longer read off the page. If you want a middle path, the cognitive-tools work is worth a look: it elicits the latent capability through modular, isolated operations rather than one long verbalized stream Can modular cognitive tools unlock reasoning without training?.
Sources 12 notes
Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.
SAE-identified reasoning features can be directly steered to match or exceed chain-of-thought performance across six model families. This reasoning mode activates early in generation and overrides surface-level instructions, suggesting latent reasoning is a fundamental capability independent of explicit prompting.
Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.
CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.
DataAlchemy experiments show CoT fails systematically under distributional shifts in task, length, and format. Models produce fluent but logically inconsistent reasoning — imitating reasoning form without valid underlying logic.
Illogical chain-of-thought exemplars matched valid CoT performance on BIG-Bench Hard, showing that structural properties—not logical validity—drive the gains. The model learns the form of reasoning, not genuine inference.
A shift cipher study decomposed CoT into three independent factors: output probability alone swings accuracy from 26% to 70%, memorization matches pre-training frequency patterns, and genuine reasoning exists but accumulates error with each step. This resolves the reason-or-memorize debate by showing LLMs do both simultaneously.
CoT guides models to pattern-match reasoning structure rather than perform genuine inference. This explains distribution-bounded failures, why structural coherence matters more than content correctness, and why performance optimizes against interpretability.
Long rationales and text-token RL help reasoning but hurt fine-grained perception tasks because the actual bottleneck is visual attention allocation, not verbalization. Standard CoT optimization trains the wrong policy target.
Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.
FLenQA shows reasoning accuracy drops from 92% to 68% at just 3000 tokens of padding, far below context window capacity. The degradation is task-agnostic, uncorrelated with language modeling performance, and persists even with chain-of-thought prompting.
Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.