Can proactive critical thinking train models to request clarification actively?

This explores whether training models to 'think critically' — to notice when a question is missing information — actually teaches them to stop and ask for clarification, rather than guessing or barreling ahead.

This explores whether proactive critical thinking can be trained into a model so it actively requests clarification instead of guessing — and the corpus says yes, but with sharp caveats about why models default to silence in the first place. The most direct evidence: reinforcement learning pushed proactive critical thinking accuracy from a near-zero 0.15% up to 73.98% on math problems that were deliberately broken or under-specified Can models learn to ask clarifying questions instead of guessing?. The capability is real and learnable — but fragile. Strikingly, giving an untrained model more inference-time 'thinking' made it *worse* at catching missing information, and only after RL training did extra thinking start to help. So the behavior isn't something you can prompt your way into; it has to be trained in.

Why is the default so bad? Several notes converge on the same culprit: standard reward training actively teaches passivity. Models tuned with ordinary RLHF are optimized for immediate, confident, single-turn helpfulness — which quietly punishes the act of asking a question Why do language models respond passively instead of asking clarifying questions?. One note frames this as an 'alignment tax on communication': preference optimization erodes the grounding acts humans use to confirm understanding by as much as 77.5% below human levels, producing models that look helpful but fail silently when intent is ambiguous Does preference optimization harm conversational understanding?. The fix in that line of work is to change *what* you reward: multi-turn-aware rewards that estimate the long-term value of an interaction give models permission to ask first and answer later.

There's a second, quite different route to the same destination — and this is where it gets interesting. Instead of explicitly rewarding clarifying questions, social meta-learning reframes tasks as teacher-student dialogues where the student must *extract* hidden information to succeed Can LLMs learn to ask for feedback during problem solving?. Models trained this way on fully-specified problems spontaneously generalize to under-specified ones, asking for what they need and delaying their answer — clarifying behavior emerges without ever being directly trained for it Can models learn to ask clarifying questions without explicit training?. And asking isn't enough on its own: question *quality* matters, which is why the ALFA framework decomposes 'a good question' into attributes like clarity, relevance, and specificity, training on attribute-specific preference pairs rather than a single helpfulness score Can models learn to ask genuinely useful clarifying questions?.

What the reader might not expect is the diagnosis underneath all of this: the problem isn't that models *can't* tell when a question is broken — it's that training never taught them when to *disengage*. Reasoning models confronted with missing premises don't reject the question; they overthink it, generating long redundant chains, while plainer non-reasoning models sometimes correctly flag the question as unanswerable Why do reasoning models overthink ill-posed questions?. Standard training optimizes for producing reasoning steps but never for knowing when to stop. The same theme shows up in the finding that RL can convert 'thinking mode' from counterproductive self-doubt into productive gap analysis — training shapes the *quality* of reasoning, not just its quantity Does extended thinking help or hurt model reasoning?. Put together, the corpus suggests that teaching a model to ask for clarification is less about adding a new skill and more about removing a trained-in compulsion to answer no matter what — and there's a measurable payoff, since proactive behavior can cut conversation turns by up to 60% Could proactive dialogue make conversations dramatically more efficient?.

Sources 9 notes

Can models learn to ask clarifying questions instead of guessing?

Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Can LLMs learn to ask for feedback during problem solving?

Research shows that reformulating static tasks as pedagogical dialogues—where a teacher has privileged information and the student must learn to extract it—trains models to actively engage conversation as a problem-solving tool, not just imitate dialogue patterns.

Can models learn to ask clarifying questions without explicit training?

Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.

Can models learn to ask genuinely useful clarifying questions?

The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.

Why do reasoning models overthink ill-posed questions?

Reasoning models generate redundant, lengthy responses to questions with missing premises while non-reasoning models correctly identify them as unanswerable. Training optimizes for producing reasoning steps but never teaches models when to disengage.

Does extended thinking help or hurt model reasoning?

Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.

Could proactive dialogue make conversations dramatically more efficient?

Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.

Can proactive critical thinking train models to request clarification actively?

Sources 9 notes

Next inquiring lines