SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Model Architecture and Internals Psychology, Society, and Alignment

Can models recognize question difficulty before they reason?

Does reasoning language models encode implicit knowledge of problem difficulty in their hidden states, even before generating solution steps? And if so, why don't they act on this knowledge?

Synthesis note · 2026-05-18 · sourced from Reasoning Methods CoT ToT
Why does chain-of-thought reasoning fail in predictable ways? How should we allocate compute budget at inference time?

S1-Bench's probing analysis demonstrates that difficulty is already there in LRM representations. A single-layer MLP trained on the final-layer hidden state of the last token in an encoded question predicts difficulty with monotonically increasing accuracy across difficulty levels. The structure is implicit but linear — no extra training, no specialized probes, no auxiliary signal is required. The model knows.

The behavioral result then forms a contradiction with this internal knowledge. On simple questions that the linear probe correctly classifies as easy, LRMs still produce redundant solution rounds, repeatedly reverify already-correct answers, and emit higher average token entropy than necessary. The hidden-state signal that says "this is easy" is overridden during generation by exploratory behavior that says "let me check again."

The authors' interpretation — and the most plausible mechanism — is that models exhibit self-doubt about their own early difficulty judgments. The model perceives the question is simple, then second-guesses that perception, then engages in exploratory generation to compensate for the imagined possibility that its initial assessment was wrong. This is a structural failure mode: the architecture lacks a mechanism to commit to an early difficulty assessment and act on it.

The deeper insight is that LRM overthinking is not a perception failure (the model fails to recognize a simple question) but an action failure (the model recognizes the question is simple but cannot translate that recognition into terminating behavior). This distinction matters for fixes: prompt-engineering for "shorter answers on easy questions" treats it as a perception problem and produces brittle results. Mechanistic fixes that route generation through the difficulty representation — for example, conditioning continued-thinking decisions on the probe output — treat it as the action problem it appears to be.

The methodology generalizes. A linear probe on a hidden state is a cheap diagnostic for any property the model is suspected to track implicitly. If the probe succeeds and the behavior contradicts it, the gap localizes the failure to the perception-to-action interface — not to representation, not to capacity.

Inquiring lines that use this note as a source 16

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
12 direct connections · 132 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

problem difficulty is linearly decodable from LRM hidden states before formal reasoning begins — yet models override this signal with exploratory overthinking suggesting architectural self-doubt