How do verbose and concise reasoning occupy different regions in activation space?

This explores what it means for an LLM to 'spend more words' versus 'fewer words' on reasoning — and the finding that verbosity isn't scattered noise but lives along a measurable direction inside the model's internal activations, which means it can be dialed up or down.

This explores what's actually happening inside the model when reasoning runs long versus short — and the surprising answer is that verbosity is geometric. It occupies a distinct, linear region of the model's activation space rather than being an emergent property of the problem. Can we steer reasoning toward brevity without retraining? shows you can extract a single steering vector from just 50 paired examples and slide a model's chain-of-thought along it — cutting length by 67% while holding accuracy, no retraining required. That a one-dimensional 'verbosity knob' exists at all is the key clue: it means long and short reasoning aren't different kinds of thinking, they're the same computation expressed at different lengths.

The corroborating evidence is that most of those extra words weren't doing computational work in the first place. Can minimal reasoning chains match full explanations? finds that minimal reasoning chains match full explanations at 7.6% of the token cost — the other 92% served style and documentation, not reasoning. So if verbose and concise modes land in separate regions of activation space, it's largely because the verbose region is padded with tokens that perform readability rather than inference. This reframes 'concise reasoning' not as a compressed version of verbose reasoning but as the actual computational core with the packaging stripped away.

What makes the geometry more than a curiosity is that length has an optimum, and models drift toward it on their own. Why does chain of thought accuracy eventually decline with length? shows accuracy peaks at an intermediate length, with stronger models preferring shorter chains — and RL training naturally gravitates there without ever being told to. Does more thinking time always improve reasoning accuracy? sharpens the warning: pushing from ~1,100 to ~16K thinking tokens dropped accuracy from 87% to 70%. Verbosity past the sweet spot isn't neutral; it actively degrades. So the 'verbose region' of activation space is partly a region of overthinking, and steering toward concision is steering back toward the productive zone.

The most radical adjacent framing is that visible reasoning tokens may be a presentation layer, not the reasoning itself. Can models reason without generating visible thinking tokens? demonstrates models scaling test-time compute through hidden-state iteration with no verbalized steps at all — suggesting verbalization is a training artifact. Do transformers hide reasoning before producing filler tokens? goes further: models can compute the correct answer in layers 1–3, then actively suppress it to emit format-compliant filler. If the real reasoning happens in latent space and the words are downstream decoration, then 'verbose vs concise regions in activation space' is measuring how much a model chooses to externalize — not how hard it's thinking.

Two cross-domain threads round this out. Can models learn when to think versus respond quickly? turns the geometry into a control problem: a model that routes between extended thinking and direct answers, calibrating verbosity per-question. And Do language models sparsify their activations under difficult tasks? hints that activation geometry shifts adaptively with difficulty — hidden states sparsify on hard, unfamiliar tasks. Read together, the picture is that verbosity is a navigable axis in the model's internal space: separable, steerable, often padded, and frequently not where the thinking actually lives.

Sources 8 notes

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

Can minimal reasoning chains match full explanations?

Chain of Draft achieves equivalent accuracy to standard chain-of-thought on arithmetic, symbolic, and commonsense tasks while using only 7.6% of tokens. The 92.4% of removed tokens served style and documentation, not computation.

Why does chain of thought accuracy eventually decline with length?

Task accuracy peaks at intermediate CoT length, with optimal length increasing alongside task difficulty but decreasing with model capability. RL training naturally gravitates toward shorter chains as models improve, revealing that simplicity emerges from reward signals rather than explicit training.

Does more thinking time always improve reasoning accuracy?

Increasing thinking tokens from ~1,100 to ~16K reduced benchmark accuracy from 87.3% to 70.3%, revealing a non-monotonic relationship where models overthink easy problems and underthink hard ones.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Can models learn when to think versus respond quickly?

Thinkless trains a single model to select between extended reasoning and direct responses using DeGRPO, which decouples mode selection from answer refinement. This prevents mode collapse and enables self-calibrated routing without explicit difficulty labels.

Do language models sparsify their activations under difficult tasks?

As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a reasoning-mechanist auditing claims about activation geometry in LLMs. The question remains open: **Do verbose and concise reasoning occupy geometrically separable regions in activation space, and if so, what do those regions compute?**

What a curated library found — and when (findings span Feb 2024–Mar 2026; treat as dated claims):
• A single steering vector extracted from ~50 paired examples can slide chain-of-thought length by 67% while preserving accuracy, suggesting verbosity is a one-dimensional axis in activation space (2025-07).
• Concise reasoning chains achieve 92% token reduction vs. verbose ones at matched accuracy, implying most extra tokens perform style/readability, not inference (2024-06).
• Accuracy peaks at intermediate reasoning length (~1,100 tokens); stronger models prefer shorter chains, and RL naturally gravitates there without explicit optimization (2025-02).
• Models compute correct answers in early layers (1–3), then suppress or overwrite them downstream to emit format-compliant filler, suggesting verbalized reasoning is a presentation layer (2024-12).
• Pushing test-time compute beyond a threshold (from ~1,100 to ~16K tokens) degrades accuracy from 87% to 70%, so the 'verbose region' may be an overthinking zone (2025-02).

Anchor papers (verify; mind their dates):
• arXiv:2507.04742 *Activation Steering for Chain-of-Thought Compression* (2025-07)
• arXiv:2412.04537 *Understanding Hidden Computations in Chain-of-Thought Reasoning* (2024-12)
• arXiv:2502.07266 *When More is Less: Understanding Chain-of-Thought Length in LLMs* (2025-02)
• arXiv:2505.13379 *Thinkless: LLM Learns When to Think* (2025-05)

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For the steering-vector result, check whether recent work (last 6 months) has: (a) scaled it to larger, multimodal, or reasoning-specialized models; (b) discovered whether the vector is task-specific or universal; (c) tested whether it generalizes across model families. For the latent-reasoning claims, verify whether evidence for suppression in early layers holds in post-training-scaled models like o1 or r1. For the accuracy-degradation threshold, ask: has that threshold itself shifted with newer architectures or test-time-compute schedulers?
(2) **Surface strongest CONTRADICTING or SUPERSEDING work from the last ~6 months.** Look for papers arguing verbosity is *not* separable; that reasoning *requires* externalization; that the steering vector captures something other than length; or that the suppression finding was an artifact of older baselines.
(3) **Propose two research questions that assume the regime may have moved:** (a) If verbosity truly is a single axis, can you steer *orthogonal* dimensions (e.g., depth vs. breadth of reasoning) independently, or is length a proxy for a deeper compositional trade-off? (b) Given that stronger models prefer shorter chains, does this mean the activation geometry of reasoning *itself* becomes more compressed in scale, or merely that output representation becomes more efficient?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

How do verbose and concise reasoning occupy different regions in activation space?

Sources 8 notes

Next inquiring lines