Can LLMs learn to ask for feedback during problem solving?

Explores whether language models can be trained to actively solicit corrective feedback mid-conversation rather than committing to single-turn answers. This matters because it could bridge the gap between fluent chat and genuine conversational learning.

Synthesis note · 2026-05-18 · sourced from Training Fine Tuning

LLMs often struggle to learn from corrective feedback within a conversational context. They rarely proactively solicit feedback even when faced with ambiguity, and their dialogues feel static and one-sided compared to human conversation. Learning to Learn from Language Feedback with Social Meta-Learning takes inspiration from how children learn — through social meta-learning (SML), the process of learning how to learn from others — and operationalizes this as a finetuning methodology for LLMs.

The methodology converts static tasks into interactive social learning problems. A math problem, normally framed as "produce a solution," becomes a pedagogical dialogue: a "student" model attempts to generate the solution over the course of a conversation, and a "teacher" model provides guidance. The student is the model being trained. The teacher can be a frozen instance of the same model or a stronger model. Critically, the teacher has access to privileged information — the correct answer or a verifier's output — that creates an information asymmetry the student must learn to exploit.

The conversational reformulation does work that single-turn training cannot. It makes the student responsible for soliciting useful information from the teacher rather than producing a complete answer in one shot. It creates problems that are solvable through dialogue but unsolvable single-turn — exposing the model to challenges beyond its in-context capability and rewarding the conversation skill rather than the raw answer skill.

This is structurally distinct from standard supervised fine-tuning on multi-turn dialogues. SFT teaches the model to imitate dialogue patterns; SML teaches the model the meta-skill of using dialogue as a problem-solving resource. The difference shows up at test time: SFT-trained models reproduce conversational style; SML-trained models actively engage the conversation to extract information they need.

The implication for chat AI design: the gap between "fluent multi-turn responder" and "effective conversational learner" is bridged by training procedures that treat conversation as the learning environment rather than as the surface. Single-turn benchmarks select for the former; SML-style training selects for the latter.

Inquiring lines that use this note as a source 18

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 128 in 2-hop network ·dense cluster Open in graph ↗

Can LLMs learn to ask for feedback during proble… Why does teacher-student information asymmetry ena… Can models learn to ask clarifying questions witho… Can structured argument prompts make LLM reasoning… Why do models fail at asking good questions during…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why does teacher-student information asymmetry enable learning signals? What role does privileged answer access play in making social meta-learning training work? Without asymmetric information, can a conversation between teacher and student function as pedagogy or only as parallel speculation?
same paper, the mechanism that makes SML training informative
Can models learn to ask clarifying questions without explicit training? Do language models trained only on fully-specified problems spontaneously develop the ability to ask for missing information when facing underspecified tasks? This tests whether conversational problem-solving strategies emerge from meta-learning rather than direct instruction.
same paper, the generalization payoff
Can structured argument prompts make LLM reasoning more rigorous? Does requiring language models to explicitly check warrants, backing, and rebuttals—rather than reasoning freely—improve reasoning quality and catch failures that standard step-by-step prompting misses?
adjacent: another approach using structured questioning to improve reasoning
Why do models fail at asking good questions during interaction? When models must actively seek information through questions rather than receive it passively, they struggle dramatically. This explores why GPT-4o plateaus at 35% accuracy and whether training or prompting can fix the underlying deficit.
adjacent: separates the problem-solving skill from the question-asking skill

Can LLMs learn to ask for feedback during problem solving?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4