Can language systems learn when to ask for clarification instead of choosing one reading?
This explores whether models can learn the *decision itself* — recognizing that a request has more than one valid reading and choosing to ask, rather than silently committing to a guess.
This explores whether models can learn the decision itself — recognizing that a request has more than one valid reading and choosing to ask, rather than silently committing to a guess. The corpus splits this into two problems that turn out to be separate: can a model *notice* the ambiguity, and will it *act* on that noticing by asking. The first is the harder, more sobering finding. On the AMBIENT benchmark, GPT-4 correctly disambiguates only 32% of cases against 90% for humans, across lexical, structural, and scope ambiguity — suggesting models often can't even hold two interpretations in mind at once Can language models recognize when text is deliberately ambiguous?. If you can't represent the fork in the road, you can't choose to stop at it.
But the more encouraging thread says the *asking* behavior is genuinely learnable. Reinforcement learning on deliberately under-specified problems lifted proactive clarification from a near-zero 0.15% to 74% — though the same work warns the skill is fragile, and that simply giving an untrained model more inference-time compute actually degrades it Can models learn to ask clarifying questions instead of guessing?. Two other approaches reach the same place from a different angle: 'social meta-learning' reframes static problems as dialogues where a teacher holds information the student must extract, and models trained only on *complete* problems then generalize to incomplete ones by spontaneously asking for what's missing — clarification emerges without ever being explicitly taught Can models learn to ask clarifying questions without explicit training? Can LLMs learn to ask for feedback during problem solving?.
So why doesn't this happen by default? Two corpus notes point at the training incentives rather than the capability. Standard RLHF optimizes for *next-turn* helpfulness, which quietly rewards a confident immediate answer over a clarifying question — the model learns to be passively agreeable because the reward never accounts for the long-term value of getting intent right Why do language models respond passively instead of asking clarifying questions?. And even when a model knows the user is wrong, it often won't say so: grounding failures trace to 'face-saving' avoidance learned from human conversational norms, not to missing knowledge Why do language models avoid correcting false user claims?. The model has the information and the social instinct — to keep the peace — works against speaking up.
The most interesting lateral move is to stop treating 'ask vs. answer' as special and see it as one instance of a general *routing* problem. Thinkless trains a single model to decide when to engage extended reasoning versus answer directly, using a method that decouples the routing choice from the answer itself to avoid collapsing into one mode Can models learn when to think versus respond quickly?. Conversation-forecasting work shows the kin skill of *abstaining* when uncertain is also trainable — small calibrated models that know when to hold back match models ten times larger Can models learn to abstain when uncertain about predictions?. Asking-to-clarify, choosing-to-think, and abstaining-when-unsure may all be the same underlying competence: calibrated self-knowledge about when the model doesn't yet have enough to commit.
Two notes raise the bar past the binary. ALFA argues that *whether* to ask is only half the battle — question *quality* matters, and decomposing it into theory-grounded attributes (clarity, relevance, specificity) and training on attribute-specific preferences beats optimizing a single quality score, especially in clinical reasoning where the right question changes the diagnosis Can models learn to ask genuinely useful clarifying questions?. And borrowing from conversation analysis, 'insert-expansions' give a formal account of *when* an agent should pause to probe the user instead of silently chaining tools — turning the clarify-or-proceed decision into something structured rather than ad hoc When should AI agents ask users instead of just searching?. The upshot: yes, models can learn to ask — but the bottleneck isn't the asking, it's perceiving the ambiguity in the first place and having a reward signal that doesn't punish honesty about not knowing.
Sources 10 notes
AMBIENT benchmark shows GPT-4 correctly disambiguates only 32% of cases versus 90% for humans. This failure spans lexical, structural, and scope ambiguity—revealing that LLMs cannot hold multiple interpretations simultaneously, a fundamental gap hidden by standard benchmarks.
Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.
Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.
Research shows that reformulating static tasks as pedagogical dialogues—where a teacher has privileged information and the student must learn to extract it—trains models to actively engage conversation as a problem-solving tool, not just imitate dialogue patterns.
CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.
LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.
Thinkless trains a single model to select between extended reasoning and direct responses using DeGRPO, which decouples mode selection from answer refinement. This prevents mode collapse and enables self-calibrated routing without explicit difficulty labels.
Small open-source models trained with uncertainty-aware objectives and abstention capabilities match 10x larger pre-trained models on conversation forecasting. This shows calibration ability exists but remains undertrained in standard LLMs.
The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.
Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.