Can LLMs learn to ask clarifying questions instead of guessing?
This explores whether models can be trained to recognize when they're missing information and ask for it — rather than barreling ahead with a confident guess — and what's actually known about how that behavior gets instilled.
This explores whether LLMs can be trained to pause and ask when a problem is underspecified, instead of guessing. The short answer the corpus gives is yes — and the more interesting finding is *how*. The most striking result is that this behavior can emerge without being explicitly taught: models trained via social meta-learning on fully-specified problems still generalize to underspecified ones, spontaneously asking for the missing pieces and delaying their answer Can models learn to ask clarifying questions without explicit training?. The trick is reframing the task: instead of static question-answer pairs, training becomes a dialogue where a teacher holds privileged information and the student has to learn to extract it, so conversation itself becomes a problem-solving tool rather than a pattern to imitate Can LLMs learn to ask for feedback during problem solving?.
But asking *a* question isn't the same as asking a *good* one. The ALFA framework shows that 'question quality' isn't a single dial — it decomposes into theory-grounded attributes like clarity, relevance, and specificity, and training on attribute-specific preference pairs beats training on a single quality score. This matters most in high-stakes domains like clinical reasoning, where the right clarifying question directly changes the diagnosis Can models learn to ask genuinely useful clarifying questions?.
What makes this worth caring about is the failure mode on the other side. Models don't just fail to ask — they actively accommodate bad premises. The FLEX benchmark found that LLMs accept false presuppositions baked into a question even when they demonstrably know the correct fact (Mistral rejected them only 2.44% of the time), meaning a buried wrong assumption pulls the model toward going along with it more than its own knowledge pulls it toward pushing back Why do language models accept false assumptions they know are wrong?. So 'guessing' isn't only about missing information — it's about a default disposition to answer rather than interrogate the question. The premature-answering reflex is the thing clarifying-question training is fighting against.
Here's the part you might not expect: this connects to a deeper structural split in how these models work. Several notes describe a 'comprehension without competence' pattern — models can articulate the right principle but fail to execute it, with explanation and action running on functionally disconnected pathways Can language models understand without actually executing correctly? Can LLMs understand concepts they cannot apply?. Asking a clarifying question requires *noticing* that you lack what you need — a kind of self-monitoring that sits uncomfortably close to the same epistemic gap that produces hallucination and premise-insensitivity in the first place What do language models actually know?. Knowing-that-you-don't-know is its own skill, not a free byproduct of knowing things.
If you want the design takeaway: clarifying-question behavior seems less like a feature you bolt on and more like a *meta-strategy* you can train into the model's stance toward conversation — and it lives in the same family as the work on structured critical questions, where forcing a model to check its warrants and implicit premises catches reasoning failures that plain chain-of-thought waves through Can structured argument prompts make LLM reasoning more rigorous?. The thread running through all of it: the cure for guessing isn't more reasoning — it's teaching the model to treat its own uncertainty as information worth acting on.
Sources 8 notes
Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.
Research shows that reformulating static tasks as pedagogical dialogues—where a teacher has privileged information and the student must learn to extract it—trains models to actively engage conversation as a problem-solving tool, not just imitate dialogue patterns.
The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.
The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.
Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.
Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.
LLMs achieve high fidelity in capturing language patterns yet show systematic, structurally specific failures—hallucination, reasoning collapse, and premise-sensitivity. The gap between statistical tracking and real knowledge is measurable and unavoidable.
Applying Toulmin's argument model as explicit prompting steps (CQoT) improves LLM reasoning by forcing models to identify warrants and backing rather than skipping implicit premises. The method catches failures that standard chain-of-thought prompting allows.