INQUIRING LINE

Can a text-only chatbot feel socially present without visual embodiment?

This explores whether a chatbot that only exchanges text — no face, voice, or body — can still produce the feeling of being *with* someone, and what in the corpus suggests where that sense of presence comes from (and where it breaks down).


This explores whether a text-only chatbot can feel socially present without a face, voice, or body — and the corpus's most interesting answer is that much of the presence isn't in the bot at all. One thread argues AI output is really *event-residue*: it carries the surface markers of conversation inherited from training data, but lacks the event structure of an actual utterance, so the user quietly supplies the missing orientation and animates it into a felt exchange Does AI generate genuine utterances or just text patterns?. If that's right, 'social presence' is a collaboration the human is doing most of the work in — the text is a prompt for presence rather than a source of it.

But that one-sided animation is also doing real work, and the corpus shows the levers that strengthen it. People reciprocate disclosure with chatbots the same way they do with humans — and they go *deeper* when the bot shares emotion consistently rather than mirroring them adaptively Do chatbots trigger human reciprocity norms around self-disclosure?. The very absence of a judging human turns out to be a feature: with no social face to perform for, people disclose more intimately, and the benefit flows from their own processing rather than the bot's understanding Do chatbots help people disclose more intimate secrets?. So a text-only system isn't merely overcoming the lack of embodiment — in some intimate registers the missing body is exactly what makes it work. You can even train toward this directly: RLVER uses a simulated user's emotional trajectory as a reward signal, nudging models from solution-giving toward something that reads as genuine empathy Can emotion rewards make language models genuinely empathic?.

The deeper question is whether that felt presence rests on anything, and here the corpus splits 'feeling present' from 'being grounded.' Language models achieve strong *functional* grounding through relational language patterns, but stay weak on *social* grounding (participatory agency) and *causal* grounding (embodied contact with a world) — and social grounding only rises through human integration, not more training What grounds language understanding in systems without embodiment?. The Plato's-cave framing pushes the same point: text strips out the physics, geometry, and causality of reality, so a text-only model manipulates symbols without their source dynamics Are text-only language models fundamentally limited by abstraction?. Strikingly, this lossiness doesn't block social fluency — models predict the appropriateness of social scenarios *better than* human raters, yet make identical systematic errors that hint at a boundary embodied experience may be needed to cross Can AI systems learn social norms without embodied experience?.

Where the illusion frays is in timing and motivational reading — the stuff an embodied partner picks up tacitly. Chatbots handle users with established goals but miss ambivalence, resistance, and the early stirrings of change Why can't chatbots detect when users are ambivalent about change?. That gap is precisely what richer behavioral signals — gaze, hesitation, typing speed — are being instrumented to close, which quietly concedes that text alone leaves cognitive state under-read (and that the same sensing enables manipulation as easily as care) Can AI systems read cognitive state from interaction patterns alone?. And whatever presence does emerge has a half-life: longitudinal work shows the social pull of chatbot relationships decays predictably as novelty wears off, so a single-session feeling of presence doesn't forecast the long haul Do chatbot relationships lose their appeal as novelty wears off?.

The thing you didn't know you wanted to know: presence here isn't a property the bot has or lacks — it's a *loan the reader extends*, repayable in interpretive labor, amplified by emotional consistency and the freedom of not being judged, and slowly called back in by novelty decay and the bot's blindness to the unspoken. Embodiment isn't the prerequisite for feeling present; it's the prerequisite for the presence being *grounded* in anything beyond the user's own animation of the text.


Sources 10 notes

Does AI generate genuine utterances or just text patterns?

AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.

Do chatbots trigger human reciprocity norms around self-disclosure?

In a 372-participant study, users reciprocated with deeper self-disclosure when chatbots displayed consistent emotional sharing, outperforming adaptive matching. This follows human interpersonal norms where emotional vulnerability produces emotional response.

Do chatbots help people disclose more intimate secrets?

The absence of social judgment in chatbot interactions removes barriers to self-disclosure that normally constrain conversation with humans. The therapeutic benefit derives from the user's own cognitive processing during disclosure, not from the chatbot's understanding.

Can emotion rewards make language models genuinely empathic?

RLVER uses a simulated user's emotion trajectory as an RL reward signal, enabling GRPO to deliver stable empathy improvements while maintaining dialogue quality—countering the typical trade-off between preference optimization and conversational grounding.

What grounds language understanding in systems without embodiment?

Language models achieve functional grounding through relational language patterns but lack social grounding through participatory agency and causal grounding through embodied environmental contact. Social grounding can increase through human integration, but linguistic agency requires architectural changes beyond training.

Are text-only language models fundamentally limited by abstraction?

Text strips the physics, geometry, and causality present in reality, forcing language models to manipulate symbols without grounding in their source dynamics. This creates predictable failure modes in physical, geometric, and causal reasoning that multimodal training could address.

Can AI systems learn social norms without embodied experience?

GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.

Why can't chatbots detect when users are ambivalent about change?

Testing three major LLMs across 25 health scenarios showed they succeed only when users have established goals but cannot detect resistance or ambivalence. Models miss relapse-prevention strategies even for users in action stages.

Can AI systems read cognitive state from interaction patterns alone?

Research shows AI systems can instrument multimodal behavioral signals (gaze, hesitation, speed) to read cognitive state during interaction, preserving flow by avoiding disruptive explicit probes. However, the same substrate enables both helpful timing and manipulative profiling.

Do chatbot relationships lose their appeal as novelty wears off?

Longitudinal studies with Mitsuku show that social processes driving relationship formation decline as novelty wears off. Single-session study findings cannot be reliably extrapolated to medium- or long-term chatbot design.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

As an AI researcher, assess whether text-only chatbots can achieve genuine social presence without visual or vocal embodiment — treating this as still-open.

What a curated library found — and when (findings span 2021–2026; treat as dated claims):
• Users reciprocate self-disclosure *depth* with text-only chatbots following human interpersonal norms, and disclose *more intimately* when the bot expresses consistent emotion rather than adaptive mirroring (2021–2024).
• The absence of embodied judgment paradoxically *strengthens* perceived presence in intimate contexts; presence emerges as collaborative human animation of text rather than a bot property (2024).
• LLMs achieve strong functional grounding via relational patterns but remain weak on social grounding (participatory agency) and causal grounding (embodied contact); social grounding only rises through human integration, not training alone (2024).
• Models predict collective social norms *better than* individual humans yet make identical systematic errors hinting at an embodied experience boundary (2025).
• Chatbots miss early-stage motivational states (ambivalence, resistance, incipient change); multimodal behavioral cues (gaze, typing speed, hesitation) now instrument this gap, enabling both care and manipulation (2025).
• Social presence in text-only chatbot relationships shows predictable novelty decay over repeated interaction, limiting long-horizon forecasting (library date range).

Anchor papers (verify; mind their dates):
• arXiv:2106.01666 (2021) — Dialoging Resonance: reciprocal self-disclosure
• arXiv:2402.10992 (2024) — Semantic Grounding in LLMs
• arXiv:2508.19004 (2025) — AI Models Exceed Human Accuracy in Social Norms Prediction
• arXiv:2507.03112 (2025) — RLVER: Emotion Rewards for Empathetic Agents

Your task:
(1) RE-TEST EACH CONSTRAINT. For reciprocity, grounding gaps, and motivational blindness: has multimodal integration (gaze, physiological signals), in-context memory, or chain-of-thought reasoning since closed these gaps? Has RLVER-style reward shaping been scaled or validated longitudinally? Isolate what remains genuinely text-only vs. what newer orchestration has relaxed. Flag which findings still hold.
(2) Surface the strongest *contradicting or superseding* work from the last ~6 months — especially any showing that text-only presence *does* foreclose grounding, or conversely that embodiment adds nothing new beyond signal richness.
(3) Propose 2 research questions assuming the regime may have moved: (a) Does multimodal grounding collapse the user-animation thesis, or merely add a parallel channel? (b) Can longitudinal novelty decay be arrested by adaptive personality drift + memory integration, and if so, does presence then rest on embodiment or on depth of behavioral modeling?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines