How does proactive critical thinking enable models to identify missing information?

This explores how training models to think critically *before* answering — rather than just thinking harder — lets them notice what a problem is missing instead of plowing ahead and guessing.

This explores how training models to think critically *before* answering lets them notice what a problem is missing instead of plowing ahead and guessing. The headline result is striking: reinforcement learning pushed proactive critical-thinking accuracy on deliberately flawed math problems from a near-zero 0.15% to nearly 74% Can models learn to ask clarifying questions instead of guessing?. But the more interesting story is *why* models are so bad at this to begin with. It turns out that solving a problem and noticing what's missing from a problem are two different skills. Models that ace fully-specified reasoning tasks collapse to 40–50% accuracy the moment one variable is quietly withheld and they have to figure out which clarifying question to ask Can models identify what information they actually need?. Being a strong solver doesn't make you a good detector of gaps.

What changes with training isn't the amount of thinking — it's its *character*. In untrained models, extended 'thinking mode' actually backfires, spiraling into self-doubt that degrades performance; RL redirects that same machinery toward productive gap analysis Does extended thinking help or hurt model reasoning?. That's why the capability is described as learnable but fragile: simply giving a base model more inference-time compute made gap-detection *worse*, and only improved it after RL had reshaped how the model spends those tokens Can models learn to ask clarifying questions instead of guessing?. More thinking is not free — accuracy peaks and then declines past a token threshold, with models overthinking the easy and underthinking the hard Does more thinking time always improve reasoning accuracy?.

The corpus also shows there's more than one route to the same behavior. You don't necessarily need to train explicitly on flawed problems: social meta-learning instills the meta-strategy of treating conversation as an information source, so models trained only on *complete* problems still generalize to underspecified ones by asking for what they need and delaying their answer Can models learn to ask clarifying questions without explicit training?. A different angle skips asking entirely and lets generation surface the gap: a model's own partial answer reveals information needs the original query couldn't express, which you can feed back as a fresh retrieval query Can a model's partial response guide what to retrieve next?. Identifying missing information, it turns out, can happen by asking, by retrieving, or by noticing the holes in your own draft.

There's a structural failure lurking underneath all of this. Reasoning models tend to 'wander' and abandon promising paths prematurely — they explore like tourists, not scientists — which means the building blocks of good gap-detection (committing to a line of inquiry, recognizing when it's incomplete) are exactly what untrained reasoning lacks Why do reasoning models abandon promising solution paths?. Training on messy search processes that include mistakes and backtracking produces markedly better problem-solvers, suggesting that exposure to the *experience* of incomplete information teaches models to handle it Does training on messy search processes improve reasoning?.

Worth knowing for anyone trying to engineer this cheaply: the easy levers mostly don't work. Telling a model it's being watched doesn't make its reasoning more faithful Does telling models they are watched improve reasoning faithfulness?, and structured prompting can sharpen a related skill — staged prompting lifts cognitive-distortion detection by over ten percent by separating assessment from analysis Can structured prompting improve cognitive distortion detection? — but the deep result stands: proactively spotting what's missing is a trained disposition, not a prompt you can bolt on.

Sources 10 notes

Can models learn to ask clarifying questions instead of guessing?

Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.

Can models identify what information they actually need?

Models achieving high accuracy on complete reasoning tasks drop to 40-50% accuracy identifying what clarifying question to ask when one variable is withheld. Information gathering and problem execution are separable cognitive operations.

Does extended thinking help or hurt model reasoning?

Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.

Does more thinking time always improve reasoning accuracy?

Increasing thinking tokens from ~1,100 to ~16K reduced benchmark accuracy from 87.3% to 70.3%, revealing a non-monotonic relationship where models overthink easy problems and underthink hard ones.

Can models learn to ask clarifying questions without explicit training?

Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.

Can a model's partial response guide what to retrieve next?

ITER-RETGEN shows that iteratively using generated responses as retrieval queries substantially improves performance on multi-hop reasoning and fact verification. Generation acts as both answer producer and information-need clarifier, surfacing implicit gaps that the original query missed.

Why do reasoning models abandon promising solution paths?

Reasoning LLMs exhibit two reinforcing failures: wandering (invalid exploration) and underthinking (premature path-switching). Decoding-level interventions like thought-switching penalties improve accuracy without fine-tuning, suggesting viable solutions exist but are abandoned prematurely.

Does training on messy search processes improve reasoning?

Stream of Search pretraining, which represents exploration and backtracking as serialized strings, achieves 25% higher accuracy than optimal-trajectory-only training. Models learn internal world models for search and adaptive strategies rather than fixed external methods.

Does telling models they are watched improve reasoning faithfulness?

Prompting models that their reasoning is monitored has no effect on hint omission rates. This suggests CoT generation is not modulated by perceived social context, ruling out prompt-engineering fixes and certain safety monitoring assumptions.

Can structured prompting improve cognitive distortion detection?

DoT prompting separates subjectivity assessment, contrastive reasoning, and schema analysis to achieve 10%+ improvement over zero-shot ChatGPT. Expert evaluators rated the resulting explanations as clinically useful for case formulation.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question: **Can models be trained to proactively identify missing information before answering, rather than guessing from incomplete premises?** Treat this as still-open.

What a curated library found — and when (findings span 2023–2026; these are dated claims, not current truth):
• RL training lifted gap-detection accuracy on deliberately flawed math from 0.15% to ~74%, but solving well-specified problems and detecting their gaps are decoupled skills — strong solvers drop to 40–50% when one variable is withheld (2025–26).
• Extended 'thinking' modes backfire in untrained models (spiraling self-doubt); RL redirects the same compute toward productive gap analysis, yet inference-time compute alone worsens detection; peaks and then declines past a token threshold (2025–26).
• Multiple routes exist: social meta-learning (training only on *complete* problems) produces emergent clarifying-question behavior; partial model responses reveal information needs, which can feed retrieval; models trained on messy search (including mistakes) handle incomplete info better (2026).
• Reasoning models explore like tourists (abandon paths prematurely), creating structural gaps in gap-detection; commitment and path recognition are trainable (2025).
• Shallow levers fail: surveillance prompts don't improve reasoning faithfulness; structured prompting (three-stage) lifts cognitive-distortion detection by ~10% but cannot replace trained disposition (2023–25).

Anchor papers (verify; mind their dates):
• arXiv:2503.22674 (2025-03) — QuestBench: Can LLMs ask the right question?
• arXiv:2505.20296 (2025-05) — Reasoning Models Don't Always Say What They Think
• arXiv:2602.16488 (2026-02) — Learning to Learn from Language Feedback with Social Meta-Learning
• arXiv:2506.04210 (2025-06) — Does Thinking More Always Help? Test-Time Scaling

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For every finding above, assess whether newer training regimes (e.g., process rewards, trajectory optimization, multi-agent orchestration, or novel RL formulations since mid-2026), improved evals (e.g., live interactive benchmarks vs. static flawed-problem sets), or scaled inference-time compute have since *relaxed* or *overturned* the 74%-ceiling or the thinking-token decline curve. Separate the durable question (likely: *can gap-detection be decoupled and trained?*) from the perishable claim (possibly: *current RL reaches plateau at 74%*). Cite what moved it.
(2) **Surface the strongest CONTRADICTING or SUPERSEDING work** from the last ~6 months — e.g., papers claiming end-to-end reasoning models now spontaneously ask clarifying questions without explicit training, or that scaling alone (not RL) recovers the capability.
(3) **Propose 2 research questions that ASSUME the regime may have moved:** e.g., "If multi-turn interactive training (model asks, human answers, model re-solves) is now standard, how does it interact with proactive gap-detection?" and "Does gap-detection transfer across problem domains, or is it brittle to distribution shift in the *type* of missing information?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

How does proactive critical thinking enable models to identify missing information?

Sources 10 notes

Next inquiring lines