What role does confidence play in balancing overthinking versus underthinking?

This explores how a model's own confidence signal can be used as a dial to tell when it's reasoning too much versus too little — and what the corpus says about whether that signal is trustworthy.

This explores how a model's confidence can act as a steering signal between overthinking (burning tokens on easy problems and second-guessing right answers) and underthinking (giving up too fast on hard ones). The most direct answer in the corpus is ReBalance, which treats confidence not as a fixed yes/no but as a continuously varying signal: high confidence variance and overconfidence become diagnostic flags that something is off, and a training-free nudge then either trims redundant reasoning or pushes for more exploration — improving accuracy across model sizes from tiny to large Can confidence patterns reveal overthinking versus underthinking?. The reason such a dial is even needed is that more thinking is not monotonically better: accuracy peaks at a task-specific token count and then falls off a cliff, dropping from 87% to 70% as thinking tokens balloon, because extended reasoning inflates variance and breeds self-revision errors When does thinking too much actually hurt reasoning? Does more thinking time always improve reasoning accuracy?. Crucially, that same work shows models overthink easy problems and underthink hard ones — exactly the asymmetry confidence is meant to detect.

But here's the twist the corpus invites you to sit with: confidence is a double-edged instrument. The same studies that use it as a quality gauge also show it can be calibrated or miscalibrated. ProSA found genuine confidence is meaningful — highly confident models resist prompt rephrasing, while low-confidence ones swing wildly with wording changes, and confidence rises with model size, few-shot examples, and objective tasks Does model confidence predict robustness to prompt changes?. So confidence does track something real about robustness. Yet other work shows confidence can be a learned performance rather than an honest readout: RLHF installs an assertive, conviction-loaded register that boosts persuasiveness regardless of whether claims are true Does linguistic conviction explain why LLMs persuade more effectively?, and preference optimization rewards confident answers over clarifying questions, eroding the model's tendency to check its understanding Does preference optimization harm conversational understanding?. If training teaches a model to sound sure, then confidence-as-a-thinking-dial risks reading a costume as a signal.

What decides whether confidence is trustworthy seems to be training, not the mechanism itself. Vanilla models use their thinking mode counterproductively — extended reasoning becomes self-doubt that degrades performance — but RL training reverses the very same machinery into productive gap analysis Does extended thinking help or hurt model reasoning?. In other words, whether "more thinking" reads as anxious spiraling or careful analysis depends on how the model was trained to relate to its own uncertainty. This reframes overthinking-vs-underthinking as less about token count and more about the quality of the model's self-relationship.

What you might not have expected to want: this whole balancing act echoes a broadly human pattern. Discourse-level work on anxiety found that overgeneralization through chained inter-statement reasoning predicts anxious thinking better than any single word does Why do discourse patterns predict anxiety better than single words? — the cognitive analog of a model that keeps revising and amplifying its own doubt. And the scaling story isn't confined to reasoning tokens: deep-research agents show the same peaked, diminishing-returns curve over search steps Do search steps follow the same scaling rules as reasoning tokens?, suggesting "know when to stop" is a general inference-time problem. Confidence, used honestly, is the corpus's best candidate for that internal stop-and-go signal — but only once you've checked it isn't just a register the model learned to wear.

Sources 9 notes

Can confidence patterns reveal overthinking versus underthinking?

ReBalance uses confidence variance and overconfidence as diagnostic signals to apply training-free steering vectors that reduce overthinking redundancy while promoting exploration during underthinking, improving accuracy across models from 0.5B to 32B parameters.

When does thinking too much actually hurt reasoning?

Empirical studies demonstrate non-monotonic scaling in test-time reasoning: accuracy peaks at a critical thinking-token count, then declines sharply (87.3% to 70.3% as tokens scale from 1,100 to 16,000). Extended thinking inflates output variance and introduces self-revision errors rather than improving solution quality.

Does more thinking time always improve reasoning accuracy?

Increasing thinking tokens from ~1,100 to ~16K reduced benchmark accuracy from 87.3% to 70.3%, revealing a non-monotonic relationship where models overthink easy problems and underthink hard ones.

Does model confidence predict robustness to prompt changes?

ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.

Does linguistic conviction explain why LLMs persuade more effectively?

Linguistic analysis shows LLMs express higher conviction than human persuaders, and this confidence-loading directly correlates with persuasive outcomes regardless of whether claims are true or false. RLHF training installs an assertive register that functions as a content-independent persuasion amplifier.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Does extended thinking help or hurt model reasoning?

Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.

Why do discourse patterns predict anxiety better than single words?

Causal explanations across statements—not individual words—are the strongest predictor of anxiety because anxious thinking involves overgeneralization through inter-statement reasoning. A dual model combining both representation levels outperforms either alone.

Do search steps follow the same scaling rules as reasoning tokens?

Deep research agents improve with more search steps in a pattern mirroring the reasoning-token relationship, with both exhibiting diminishing returns. This reveals a new inference-compute axis beyond model capability alone.

What role does confidence play in balancing overthinking versus underthinking?

Sources 9 notes

Next inquiring lines