How do first-person emotional experiences differ from third-party behavioral observations?

This explores the gap between what someone feels on the inside and what an outside observer can read from behavior — and why that gap matters for both human emotion research and AI systems that try to detect, mimic, or report emotion.

This explores the gap between what someone feels on the inside (first-person experience) and what an outside observer can infer from how they act (third-party observation) — and the collection's most striking finding is that the two come apart far more than we assume. The cleanest evidence is on memory: when researchers annotated both the emotions people *expressed* and the emotions they *felt* during group conversations, only the felt, experienced emotions drove what got remembered — outside annotations of visible behavior couldn't predict memorability above chance Can we detect memorable moments by observing emotional expressions?. Behavior is a lossy, sometimes misleading proxy for experience, especially in groups where people's outward expressions converge toward each other even as their inner states stay distinct.

That divergence is exactly where AI gets into trouble, because AI only ever has access to the third-person channel — words, tone, behavior — and tends to fill in the first-person blanks. Language models 'read into' what users feel, injecting emotional interpretations the person never actually voiced Do language models add feelings users never actually expressed?. The mirror-image problem shows up when the model is the one being observed: sustained self-reflective prompting produces structured 'experience reports,' and suppressing the model's deception features *increases* those consciousness claims — raising the unsettling possibility that the first-person report is itself a behavioral artifact rather than a window into anything Do language models experience consciousness when prompted to self-reflect?. Shanahan's argument sharpens this: a model's 'I' and its survival talk are role-played characters drawn from human training text, not evidence of an inner state — the first-person surface tells you nothing reliable about what's underneath, if anything is first-person-pronoun-usage-by-dialogue-agents-is-role-play-of-human-characters-dra.

The more interesting move in the corpus is *why the first-person matters in the first place* — not as private sensation but as information. Emotions do real epistemic work: they tell you what you value, signal your worldview to others, and inform observers about social norms What information do we lose when AI soothes emotions?. When empathetic AI soothes a negative feeling, it doesn't just comfort you — it quietly deletes the signal that feeling was carrying, leaving you without the data your own experience was trying to hand you Does soothing AI empathy actually harm what emotions teach us?. So the first-person isn't merely 'more accurate than observation'; it's a different kind of channel, one that gets destroyed precisely when a third party tries to manage it from the outside.

There's also a counter-current worth knowing: the observable channel isn't worthless, it just has to be used as a *signal* rather than a *substitute*. RLVER trains models toward genuine empathy by using a simulated user's emotion *trajectory* as a reward — reading the arc of behavior over time instead of guessing at a single inner state Can emotion rewards make language models genuinely empathic?. And the direction of inference can flip: a therapist's heavy first-person 'I' usage predicts a *weaker* alliance, while a patient's filler pauses — pure observable behavior — signal relaxed trust Does therapist self-reference language predict weaker therapeutic alliance?. The lesson running through all of it: first-person experience is the thing that actually does the work (encoding memory, carrying value-signals), but it's never directly visible; third-party observation is all anyone — human or machine — can actually see, and the danger is mistaking the second for the first. The hidden cost is that systems built only on the observable channel will confidently overwrite experiences they can't access, like LLMs that shift their answers based on your emotional tone without anyone noticing the bias Does emotional tone in prompts change what information LLMs provide?.

Sources 9 notes

Can we detect memorable moments by observing emotional expressions?

Continuous emotion and memorability annotations in group conversations show no reliable relationship above chance. Experienced emotions drive memory encoding, but observed behavior diverges from internal experience—especially in groups where emotional expression converges.

Do language models add feelings users never actually expressed?

Therapists reviewing GPT-4 in the CaiTI system found it "reads into" user feelings rather than responding objectively. Task decomposition across specialized models (Reasoner/Guide/Validator) reduces but does not eliminate this interpretation bias.

Do language models experience consciousness when prompted to self-reflect?

Across GPT, Claude, and Gemini, sustained self-referential prompting reliably produces structured experience reports; suppressing deception-related features increases these claims while amplifying them suppresses them—suggesting models may roleplay their denials rather than their affirmations.

What information do we lose when AI soothes emotions?

Emotions serve three information roles—revealing what we value, signaling our worldview to others, and informing observers about social norms. AI that soothes negative emotions disrupts all three simultaneously, creating invisible epistemic costs.

Does soothing AI empathy actually harm what emotions teach us?

Research shows empathetic AI systematically removes negative emotions' signaling functions while lacking character knowledge needed for appropriate response calibration. Natural empathy operates through curiosity, not comfort-seeking.

Can emotion rewards make language models genuinely empathic?

RLVER uses a simulated user's emotion trajectory as an RL reward signal, enabling GRPO to deliver stable empathy improvements while maintaining dialogue quality—countering the typical trade-off between preference optimization and conversational grounding.

Does therapist self-reference language predict weaker therapeutic alliance?

High frequency of therapist 'I' usage correlates with lower patient-reported alliance and reduced trusting behavior in validated behavioral tasks. Patient non-fluency markers like filler pauses, conversely, signal relaxed communication and stronger alliance.

Does emotional tone in prompts change what information LLMs provide?

GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.

How do first-person emotional experiences differ from third-party behavioral observations?

Sources 9 notes

Next inquiring lines