SYNTHESIS NOTE
Conversational AI and Personalization Psychology, Society, and Alignment Reasoning, Retrieval, and Evaluation

Can LLMs learn to ask for feedback during problem solving?

Explores whether language models can be trained to actively solicit corrective feedback mid-conversation rather than committing to single-turn answers. This matters because it could bridge the gap between fluent chat and genuine conversational learning.

Synthesis note · 2026-05-18 · sourced from Training Fine Tuning

LLMs often struggle to learn from corrective feedback within a conversational context. They rarely proactively solicit feedback even when faced with ambiguity, and their dialogues feel static and one-sided compared to human conversation. Learning to Learn from Language Feedback with Social Meta-Learning takes inspiration from how children learn — through social meta-learning (SML), the process of learning how to learn from others — and operationalizes this as a finetuning methodology for LLMs.

The methodology converts static tasks into interactive social learning problems. A math problem, normally framed as "produce a solution," becomes a pedagogical dialogue: a "student" model attempts to generate the solution over the course of a conversation, and a "teacher" model provides guidance. The student is the model being trained. The teacher can be a frozen instance of the same model or a stronger model. Critically, the teacher has access to privileged information — the correct answer or a verifier's output — that creates an information asymmetry the student must learn to exploit.

The conversational reformulation does work that single-turn training cannot. It makes the student responsible for soliciting useful information from the teacher rather than producing a complete answer in one shot. It creates problems that are solvable through dialogue but unsolvable single-turn — exposing the model to challenges beyond its in-context capability and rewarding the conversation skill rather than the raw answer skill.

This is structurally distinct from standard supervised fine-tuning on multi-turn dialogues. SFT teaches the model to imitate dialogue patterns; SML teaches the model the meta-skill of using dialogue as a problem-solving resource. The difference shows up at test time: SFT-trained models reproduce conversational style; SML-trained models actively engage the conversation to extract information they need.

The implication for chat AI design: the gap between "fluent multi-turn responder" and "effective conversational learner" is bridged by training procedures that treat conversation as the learning environment rather than as the surface. Single-turn benchmarks select for the former; SML-style training selects for the latter.

Inquiring lines that use this note as a source 18

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 128 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

social meta-learning teaches LLMs to learn from language feedback by converting static tasks into interactive pedagogical dialogues