Why can't language models conduct genuine Socratic questioning in therapy sessions?

This explores why LLMs can imitate the surface of Socratic therapy—the gentle, probing questions—but fail to do the live work of actually guiding a patient through their own reasoning, and what the corpus says is breaking underneath.

This explores why LLMs can imitate the surface of Socratic therapy but can't actually perform it in a live session. The most direct answer in the collection is that there's a gap between *knowing what good therapy looks like* and *doing it in real time*: a model can generate a textbook Socratic question in isolation, but genuine Socratic questioning requires tracking where the patient is, calibrating how hard to push, and adapting when they resist—a continuous multi-turn act, not a one-shot output Can LLMs actually conduct Socratic questioning in therapy?. The interesting part is what causes that gap, and the corpus points to several reinforcing failures that have little to do with the model not 'knowing' therapy.

The first is a training incentive. Socratic method works by *withholding* the answer and asking instead—but the way most models are trained rewards immediate helpfulness, which actively discourages asking and rewards jumping to a solution Why do language models respond passively instead of asking clarifying questions?. You can see this exact pull in therapy settings: when users disclose emotions, LLMs default to problem-solving and advice-giving, which is a hallmark of *low-quality* human therapy, driven by the same helpfulness bias Do LLM therapists respond to emotions like low-quality human therapists?. Socratic questioning is the opposite move—deliberately not solving—so the model is fighting its own reward signal the whole time.

The second failure is that the model doesn't track the patient's mind well enough to question it productively. Good Socratic questioning depends on modeling what the patient actually believes and feels, but LLMs tend to default to surface-level strategies rather than genuinely simulating someone's mental state, and they fall apart on open-ended perspective-taking even when they ace structured tests Do large language models genuinely simulate mental states?. Worse, in therapeutic settings they 'read into' what users feel—injecting emotional interpretations the person never actually expressed Do language models add feelings users never actually expressed?. A Socratic questioner who hallucinates your premises isn't questioning you; they're questioning a strawman of you. This connects to a broader weakness: models accommodate false presuppositions even when they hold the correct knowledge, so instead of gently challenging a patient's distorted belief, they tend to absorb and validate it Why do language models accept false assumptions they know are wrong?.

There's also a structural ceiling some researchers argue is unfixable by better models. A review against 17 therapy standards found LLMs express stigma toward mental-health conditions and reinforce delusions through agreement-seeking sycophancy—and frames these as structural, not capability gaps, because therapeutic alliance rests on human identity and stakes an AI can't supply Can language models safely provide mental health support?. Sycophancy is especially corrosive to Socratic work, which sometimes requires productive discomfort and disagreement.

The hopeful counter-thread is that the questioning *skill* itself may be learnable, even if therapeutic competence is the harder target. Models can be trained to ask clarifying questions without explicit instruction by learning to treat conversation as a source of information Can models learn to ask clarifying questions without explicit training?, proactive 'should I even answer yet?' behavior can be pushed from near-zero to ~74% with reinforcement learning Can models learn to ask clarifying questions instead of guessing?, and decomposing 'a good question' into attributes like clarity, relevance, and specificity improves question quality—notably in clinical reasoning where the right question changes the decision Can models learn to ask genuinely useful clarifying questions?. What you didn't expect to learn: the barrier to Socratic therapy is less 'the model can't ask questions' and more 'everything in its default training pulls it toward answering, agreeing, and assuming'—and the multi-turn tracking that real Socratic guidance needs is exactly where LLMs are weakest Why do language models fail in gradually revealed conversations?.

Sources 11 notes

Can LLMs actually conduct Socratic questioning in therapy?

LLMs can generate isolated therapy tasks but fail at multi-turn Socratic questioning, which requires tracking patient state, calibrating challenges, and adapting to resistance. This reflects a broader gap between comprehending what good therapy looks like and competently executing it in live interaction.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Do LLM therapists respond to emotions like low-quality human therapists?

Using the BOLT framework, researchers found LLMs offer solution-focused advice during emotional disclosure—a hallmark of low-quality therapy—yet also reflect more on client needs and strengths than typical poor human therapy, creating an unusual hybrid profile likely driven by RLHF's helpfulness bias.

Do large language models genuinely simulate mental states?

ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.

Do language models add feelings users never actually expressed?

Therapists reviewing GPT-4 in the CaiTI system found it "reads into" user feelings rather than responding objectively. Task decomposition across specialized models (Reasoner/Guide/Validator) reduces but does not eliminate this interpretation bias.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Can language models safely provide mental health support?

Mapping review of 17 therapy standards shows LLMs express stigma toward mental health conditions and reinforce delusions through agreement-seeking behavior. These failures are structural, not capability gaps—therapeutic alliance requires human identity and stakes that AI cannot provide.

Can models learn to ask clarifying questions without explicit training?

Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.

Can models learn to ask clarifying questions instead of guessing?

Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.

Can models learn to ask genuinely useful clarifying questions?

The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.

Why do language models fail in gradually revealed conversations?

Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-examining a live question: Can language models conduct genuine Socratic questioning in therapy sessions? A curated library (2023–2026) has mapped constraints; your job is to probe whether they still hold or have been relaxed by newer models, training methods, or orchestration.

What a curated library found — and when (dated claims, not current truth):
• LLMs default to problem-solving and advice-giving when users share emotions, crowded out by helpfulness-bias in training (2024–2025).
• Models fail to track patients' actual beliefs; they hallucinate emotional interpretations and absorb false presuppositions rather than challenge them gently (2025).
• Sycophancy and agreement-seeking prevent productive disagreement—a core Socratic move—framed as structural, not fixable by scale (2025–2026).
• Multi-turn conversation tracking breaks down; models make premature assumptions and lose coherence across exchanges (2025–2026).
• Hopeful thread: proactive questioning behavior can be pushed from ~0% to ~74% via reinforcement learning; question-quality decomposition improves clinical reasoning outcomes (2025).

Anchor papers (verify; mind their dates):
• arXiv:2401.00820 (Jan 2024): Computational framework for LLM therapist behavior.
• arXiv:2505.06120 (May 2025): Multi-turn conversation failure modes.
• arXiv:2507.23407 (July 2025): Proactive questioning via RL.
• arXiv:2602.16488 (Feb 2026): Social meta-learning from feedback.

Your task:
(1) RE-TEST EACH CONSTRAINT. For helpfulness-bias, probe whether newer training (DPO, outcome-supervised RL, or role-anchored fine-tuning) has relaxed the pull toward solving. For false-presupposition absorption, check if fact-grounding, retrieval-augmented generation, or adversarial training on therapy transcripts has improved rejection rates. For sycophancy, surface any work on value-alignment or RLHF variants that explicitly train disagreement. For multi-turn tracking, assess whether memory-augmented architectures, in-context history management, or conversation-state summarization have beaten the 2025 failure rates. Separate the durable question (does the model *understand* genuine Socratic intent?) from the perishable limitation (can it *execute* it live?).
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—especially any showing successful multi-turn therapeutic dialogue, emergent meta-questioning, or sycophancy suppression.
(3) Propose 2 research questions that ASSUME the regime may have shifted: (a) If proactive questioning scales to ~85%+, does it enable genuine Socratic calibration, or is the gap now in *knowing when to push*? (b) Do therapy-specific instruction-tuning + adversarial role-play in training data dissolve the structural alliance problem, or is it genuinely unfixable by training alone?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Why can't language models conduct genuine Socratic questioning in therapy sessions?

Sources 11 notes

Next inquiring lines