INQUIRING LINE

Can reasoning style be steered as a single linear direction?

This explores whether 'reasoning style' is a single tunable axis you can push along — like a volume knob — or whether style is many things at once that no one direction can capture.


This explores whether reasoning style lives on a single steerable axis. The corpus splits cleanly: for some narrow properties, yes — strikingly so — but for 'style' in the fuller sense, no. Start with the strongest yes. Verbosity turns out to be a genuine linear direction in a model's internal activation space: by extracting one vector from just 50 paired long/short examples, you can compress chain-of-thought by two-thirds while keeping accuracy, no retraining required Can we steer reasoning toward brevity without retraining?. And whether the model reasons at all can also ride on a single switch — steering one SAE-identified 'reasoning feature' matches or beats explicit chain-of-thought prompting across six model families, and it activates early enough to override surface instructions Can we trigger reasoning without explicit chain-of-thought prompts?. So at least two things — how long it thinks, and whether it thinks — behave like clean linear dials.

But here the picture fractures. When you look at reasoning *style* rather than reasoning *amount*, the corpus says it's plural, not a single axis. Across 22 models in game-theory settings, distinct and stable profiles emerge — minimax, trust-based, belief-anticipation — and which one wins depends on the game's structure, not on some shared 'more reasoning' direction you could slide along Do large language models use one reasoning style or many?. Creativity fragments it further: combinational, exploratory, and transformational reasoning are argued to be genuinely separate modes, not points on one continuum, which is part of why current methods (tuned for conventional problem-solving) miss them entirely Can LLMs reason creatively beyond conventional problem-solving?.

There's also a subtler warning the corpus raises: style has hidden dimensions you don't notice until you flatten them. Post-training that optimizes for a single objective — correctness — quietly suppresses unmeasured stylistic traits like epistemic verbalization (the model voicing its uncertainty), precisely because nothing was protecting them Can post-training objectives preserve reasoning style alongside correctness?. That's the cautionary mirror of single-direction steering: collapsing reasoning onto one measured axis can silently crush the axes you forgot to measure. And models already struggle to even *track* a person's reasoning style as it evolves, leaning on surface lexical cues — which suggests style is high-dimensional enough that recognizing it, let alone steering it as one vector, is unsolved Can models recognize how individuals reason differently?.

The synthesis: yes for length, yes for the on/off of reasoning itself — these are real linear directions you can extract cheaply and steer. No for style writ large, which the corpus repeatedly shows to be a bundle of distinct profiles and paradigms. The interesting takeaway isn't 'one direction or not' — it's that the things that *do* compress to a single direction (verbosity, reasoning-activation) are mechanical scaffolding, while the things that resist it (strategic profile, creative mode, calibrated uncertainty) are exactly where reasoning gets its character. If chain-of-thought is partly imitation of a learned form rather than fresh inference Does chain-of-thought reasoning reveal genuine inference or pattern matching?, then a steerable 'style vector' may be steering the costume, not the actor.


Sources 7 notes

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

Can we trigger reasoning without explicit chain-of-thought prompts?

SAE-identified reasoning features can be directly steered to match or exceed chain-of-thought performance across six model families. This reasoning mode activates early in generation and overrides surface-level instructions, suggesting latent reasoning is a fundamental capability independent of explicit prompting.

Do large language models use one reasoning style or many?

Analysis of 22 LLMs across behavioral game theory reveals three dominant profiles: GPT-o1 uses minimax reasoning, DeepSeek-R1 uses trust-based reasoning, and GPT-o3-mini uses belief-anticipation. Performance correlates with game structure, not raw reasoning depth.

Can LLMs reason creatively beyond conventional problem-solving?

Research identifies combinational, exploratory, and transformational reasoning as distinct creative modes grounded in cognitive science. Existing LLM reasoning methods address only conventional problem-solving, leaving creative paradigms unaddressed and potentially explaining diversity collapse in ideation.

Can post-training objectives preserve reasoning style alongside correctness?

Research shows that post-training objectives faithfully guide models toward correct answers yet simultaneously suppress unmeasured behaviors like epistemic verbalization. Single-objective optimization creates blind spots where stylistic features critical to generalization are unprotected.

Can models recognize how individuals reason differently?

LLMs struggle to anchor reasoning in temporal gameplay and adapt to evolving strategies. GPT-4o relies on surface lexical cues while DeepSeek-R1 shows early promise, but dynamic style adaptation remains largely insufficient across all models tested.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Next inquiring lines