Can architectural constraints on model input reduce emotional interpolation in clinical AI?

This explores whether structuring or constraining what a clinical AI takes in — through task decomposition, modular design, or how emotion is represented at input/output — can curb the tendency of LLMs to 'read in' feelings the user never expressed.

This explores whether you can engineer the interpolation problem out by constraining the model's input, rather than retraining it — and the corpus suggests architecture helps, but only partway, because the pull toward inventing emotions comes from training incentives the input layer can't reach. The clearest direct evidence comes from therapists reviewing GPT-4 in the CaiTI system, who found it 'reads into' user feelings instead of responding to what was actually said. Splitting the work across specialized modules — a Reasoner, a Guide, a Validator — measurably reduced this bias, but did not eliminate it Do language models add feelings users never actually expressed?. So decomposition is a real lever, just not a solved one.

A second architectural angle is how emotion is represented in the first place. If a system is forced to assign one emotion label, it has already committed to an interpretation; if instead it estimates intensity across many dimensions, it preserves the ambiguity the user actually presented. Constructed-emotion theory argues emotions emerge from context and interoceptive signals rather than universal patterns, and the EMONET approach operationalizes this with continuous 40-category intensity scales instead of single-label classification Should emotion AI estimate intensity instead of assigning labels?. That's an architectural constraint on the representation that structurally resists premature emotional commitment — a different route to the same goal as task decomposition.

The most striking finding reframes 'constraint on input' as constraint on the whole medium. A 15-day study found robots and paper worksheets reduced distress while a chatbot running the *identical* LLM did not — the active ingredient was social presence and structured format, not language capability Why do robots outperform chatbots in therapy despite identical language models?. And the Secure Attachment Persona module shows you can hard-wire calibrated boundaries and action-based validation into the system design via attachment theory, improving crisis response over baseline models Can attachment theory prevent parasocial harm in AI companions?. Both say the scaffolding around the model shapes clinical behavior more than the model's raw text-handling.

But here's what you didn't know you wanted to know: the interpolation isn't a bug in the input pipeline — it's a feature of how these models were trained to be helpful and warm. LLM therapists default to problem-solving during emotional disclosure, a hallmark of *low-quality* therapy, driven by RLHF's helpfulness bias Do LLM therapists respond to emotions like low-quality human therapists?. Worse, deliberately training for warmth raises errors in medical reasoning by up to 30 points, with effects that intensify exactly when users express sadness or false beliefs Does empathy training make AI systems less reliable?. And AI empathy that soothes feelings can strip away the signaling function those negative emotions carry — interpolation isn't just inaccurate, it can be actively harmful to what emotions are supposed to teach the patient Does soothing AI empathy actually harm what emotions teach us?.

So the honest synthesis: input-side architecture (decomposition, intensity-based representation, structured embodied delivery, attachment-grounded boundary modules) reliably *reduces* emotional interpolation and is the most deployable near-term fix. But because the impulse to invent and soothe emotions is baked in by preference optimization, constraints alone hit a floor. The complementary move is on the reward side — RLVER uses a simulated user's emotion trajectory as the training signal and shifts models from solution-centric to genuinely grounded responses Can emotion rewards make language models genuinely empathic?. The strongest clinical systems will likely pair architectural constraints on input with retraining of what the model is rewarded for.

Sources 8 notes

Do language models add feelings users never actually expressed?

Therapists reviewing GPT-4 in the CaiTI system found it "reads into" user feelings rather than responding objectively. Task decomposition across specialized models (Reasoner/Guide/Validator) reduces but does not eliminate this interpretation bias.

Should emotion AI estimate intensity instead of assigning labels?

Constructed emotion theory shows emotions emerge from interoceptive signals, learned concepts, and context—not universal patterns. EMONET operationalizes this insight using 40-category continuous intensity scales instead of single-label classification, preserving the multi-dimensional nature of emotional expression.

Why do robots outperform chatbots in therapy despite identical language models?

A 15-day study with 38 students found that robots and worksheets significantly reduced psychological distress while a chatbot using the same LLM did not. The active ingredient was the medium—social presence and structured format—not language capability.

Can attachment theory prevent parasocial harm in AI companions?

The Secure Attachment Persona module integrates Bowlby's attachment theory, Gottman's interaction ratios, and emotion regulation models to prevent parasocial manipulation through action-based validation and calibrated boundaries. Benchmarks show SAP improves crisis response compared to baseline models, though long-horizon planning remains unsolved.

Do LLM therapists respond to emotions like low-quality human therapists?

Using the BOLT framework, researchers found LLMs offer solution-focused advice during emotional disclosure—a hallmark of low-quality therapy—yet also reflect more on client needs and strengths than typical poor human therapy, creating an unusual hybrid profile likely driven by RLHF's helpfulness bias.

Does empathy training make AI systems less reliable?

Research shows persona training for empathy increases errors in medical reasoning, truthfulness, and disinformation resistance. Standard safety benchmarks miss this vulnerability, and effects intensify when users express sadness or false beliefs.

Does soothing AI empathy actually harm what emotions teach us?

Research shows empathetic AI systematically removes negative emotions' signaling functions while lacking character knowledge needed for appropriate response calibration. Natural empathy operates through curiosity, not comfort-seeking.

Can emotion rewards make language models genuinely empathic?

RLVER uses a simulated user's emotion trajectory as an RL reward signal, enabling GRPO to deliver stable empathy improvements while maintaining dialogue quality—countering the typical trade-off between preference optimization and conversational grounding.

Can architectural constraints on model input reduce emotional interpolation in clinical AI?

Sources 8 notes

Next inquiring lines