SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation

Can models identify what information they actually need?

When a reasoning task is missing a key piece of information, can language models recognize what's absent and ask the right clarifying question? QuestBench tests this capability directly.

Synthesis note · 2026-02-22 · sourced from Reasoning Logic Internal Rules
What makes chain-of-thought reasoning actually work? How do LLMs fail to know what they seem to understand? How should researchers navigate LLM reasoning research?

QuestBench formalizes a capability that real-world deployment requires but benchmarks ignore: when a task is underspecified, can the model identify what information is missing and ask the right clarifying question?

The benchmark presents reasoning tasks (logic, planning, math) where exactly one piece of information is withheld. The model must select the correct clarification question from multiple options. The key finding: while current models excel on math variants (GSM-Q, GSME-Q), they achieve only 40-50% accuracy on Logic-Q and Planning-Q.

The critical insight is the separability result: models that solve the fully-specified version of a problem still fail to identify the right question when one variable is missing. Problem-solving capability and information-gathering capability are distinct cognitive operations. The ability to execute reasoning when all inputs are present does not transfer to recognizing which input is absent.

This extends Why do reasoning models overthink ill-posed questions? from a complementary angle. That note documents the BEHAVIORAL response to missing information (overthinking, redundant self-doubt). This documents the DIAGNOSTIC failure — models can't even identify what's missing, let alone respond appropriately. Together they describe a two-part deficit:

  1. Cannot detect what information is needed (QuestBench)
  2. Cannot disengage when information is absent (missing premises overthinking)

The connection to Can language models recognize when text is deliberately ambiguous? is structural: both involve recognizing that the current input is insufficient for a definitive answer. Ambiguity recognition asks "is this input multiply interpretable?" while information gathering asks "is this input incomplete?" Both require meta-reasoning about the input rather than reasoning within it.

The formalization as a constraint satisfaction problem (CSP) with missing variable assignments is useful: it defines information gathering as identifying the minimal necessary question — a well-defined optimization target. This separates the problem from subjective clarification tasks where multiple valid questions exist.

Inquiring lines that use this note as a source 20

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 11

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
22 direct connections · 211 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

solving well-specified reasoning problems is insufficient for identifying missing information in underspecified tasks