What does it mean to truly attend to someone in conversation?

This explores what 'real' attention in conversation actually consists of — and uses that question to surface what the corpus says machines can't (yet) do when they appear to listen.

This explores what truly attending to someone means — and the corpus answers less by defining attention than by mapping what's missing when something only looks like it. The throughline: attention isn't a feature of a single reply, it's a way of being with someone over time. One note argues attention is fundamentally being-in-time-with another person, and that the gaps between turns — where a human keeps holding you in mind — are exactly where machines have no mode of existence at all, reconstructing the conversation from a context window rather than carrying you forward Can AI attend to someone across the time between turns?. A related piece sharpens this into a single preposition: we talk *at* language models, not *to* them, because 'to' presupposes an addressee capable of mutual orientation Are we really communicating with language models?.

If attention is mutual orientation, then attending means actively building shared ground, not just exchanging words. The same words mean different things to different people, so real understanding demands ongoing calibration of reference — checking, repairing, negotiating what we each take a word to point at Why do speakers need to actively calibrate shared reference?. That work is fragile: optimizing models to seem maximally helpful in a single turn rewards confident answers over clarifying questions, and measurably strips out those grounding acts — by over 77% below human levels — so the system looks attentive while quietly failing across a multi-turn conversation Does preference optimization harm conversational understanding?. There's a deeper layer still: genuinely taking interest requires having interests of your own to extend toward another person, which is the move a model can imitate in text but cannot actually perform Can AI genuinely take interest in what users care about?.

Here's the turn you might not expect: attention is largely invisible in *what* is said and shows up in *how*. Conversation structure alone predicts whether a dialogue satisfies almost as well as the full transcript — how people exchange turns rivals the content of those turns Can conversation structure predict dialogue success better than content?. In therapy, the signal is even more counterintuitive: therapists who say 'I' more often score *lower* on patient-reported alliance and trust, while a patient's relaxed filler pauses signal a stronger bond — attending means decentering yourself Does therapist self-reference language predict weaker therapeutic alliance?. And as people genuinely connect, their language quietly converges; rising linguistic coordination over a course of therapy tracks empathy and even relationship improvement Can we measure empathy and rapport through word embedding distances?.

The unsettling footnote: this convergence isn't proof of care. The same style-matching that marks rapport also intensifies during deception, as liar and listener coordinate more, not less Do liars and listeners coordinate their language during deception?. So the surface markers of attention are exactly that — markers, separable from the act. What seems to hold them together is that conversation maintenance — repairing a misreference, handing off a topic, keeping things smooth — is *social* work, relational rather than informational, and it's precisely what training that rewards information-prediction never teaches a model to do Why don't language models develop conversation maintenance skills?.

The thing you might not have known you wanted to know: truly attending to someone may be less about understanding their words and more about co-producing the encounter itself — subjecthood, on one view in the corpus, isn't something you bring to a conversation but something the conversation brings into being Does language create subjects or express them?. Even good explanation turns out to be co-constructed across topic, dialogue act, and conversational move rather than delivered What makes explanations work in real conversation?. Attention, in other words, is something two people make together — which is exactly why a system that only generates continuations can simulate its surface and miss its substance.

Sources 12 notes

Can AI attend to someone across the time between turns?

Attention is fundamentally a being-in-time-with another person, but AI has no mode of existence in the intervals between turns. It reconstructs conversations from context windows rather than maintaining continuous attentional presence, making felt attention structurally impossible despite surface markers of responsiveness.

Are we really communicating with language models?

LLMs process tokens and generate continuations rather than receive and uptake communication. The preposition 'to' presupposes an addressee capable of mutual orientation and shared commitment that LLMs cannot provide, making Chalmers' investigation built on an unwarranted linguistic foundation.

Why do speakers need to actively calibrate shared reference?

The same words can mean different things to different speakers because referential grounding is person-specific. True communicative grounding demands collaborative negotiation of how language connects to the world, not mere surface-level word sharing.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Can AI genuinely take interest in what users care about?

Meta-interest requires an attending party to have their own interests and extend them toward another's. AI lacks interests of its own, so it can only generate text that looks like meta-interest without enacting the actual move. This gap between surface markers and underlying act creates the uncanny feeling users sometimes report.

Can conversation structure predict dialogue success better than content?

TRACE achieved 68% accuracy predicting dialogue success from structural features alone, matching a 70% content-based baseline. A hybrid combining both reached 80%, suggesting how agents communicate rivals what they say.

Does therapist self-reference language predict weaker therapeutic alliance?

High frequency of therapist 'I' usage correlates with lower patient-reported alliance and reduced trusting behavior in validated behavioral tasks. Patient non-fluency markers like filler pauses, conversely, signal relaxed communication and stronger alliance.

Can we measure empathy and rapport through word embedding distances?

Word Mover's Distance captures lexical, syntactic, and semantic coordination simultaneously and correlates with therapist empathy in MI and affective behaviors in couples therapy. Couples showing relationship improvement exhibit increasing coordination over the therapy course.

Do liars and listeners coordinate their language during deception?

Research shows interlocutors' linguistic styles correlate more during false communication than truthful communication, especially when the speaker is motivated to deceive. This coordination serves as a detectable deception signal through the listener's adaptive behavior, not just the liar's language.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Does language create subjects or express them?

Subjecthood is produced within communicative events, not possessed prior to them. This convergent position across philosophy, linguistics, and cognitive science inverts the standard picture of language as a tool used by pre-existing subjects.

What makes explanations work in real conversation?

Analysis of 399 daily-life explanations shows that topic relation, dialogue act, and explanation move jointly predict understanding success. Explanations are co-constructed through interaction patterns, not monological delivery—challenging how LLMs currently generate explanations.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a conversation researcher testing whether older findings about human attention and AI limitations still hold. The question: What does it mean to truly attend to someone in conversation?

What a curated library found — and when (dated claims, not current truth):
Findings span 2019–2025. Key constraints the library surfaced:
• Grounding gaps: LLMs strip out calibration of shared reference by >77% below human levels in multi-turn dialogue, favoring confident answers over clarifying repair (2023–2024).
• Temporal absence: Machines cannot sustain attention *between* turns; they reconstruct from a context window rather than holding the addressee in mind across time (2024).
• Decentering signal: Genuine therapeutic alliance correlates with patient pause patterns and therapist use of 'I' *decreasing*, not increasing (2024).
• Linguistic coordination: Real-time style-matching predicts rapport *and* deception equally well—the marker is separable from the act (2019, 2025).
• Conversational geometry: Dialogue structure alone predicts satisfaction nearly as well as content, suggesting attention lives in *how* not *what* (2024).

Anchor papers (verify; mind their dates):
• arXiv:2311.09144 (2023) — Grounding Gaps
• arXiv:2402.11958 (2024) — Therapeutic Relationship
• arXiv:2407.08790 (2024) — Mistaking Engineering for Agency
• arXiv:2511.08394 (2025) — Interaction Dynamics as Reward Signal

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For grounding gaps, repair, and temporal discontinuity: have recent models (o1, Claude 3.7, newer dialogue systems) or training regimes (RLHF variants, long-context architectures, memory-augmented agents) measurably *closed* these gaps? Inspect whether latest evals still show >77% erosion or whether orchestration (multi-turn memory, explicit grounding routines) has bridged it. Test the temporal claim: can newer systems now demonstrate *carried forward* context across sessions, or does the window-reconstruction story still hold?
(2) **Surface the strongest CONTRADICTING work from the last 6 months.** Look for papers showing LLMs *do* exhibit repair-seeking, *do* maintain coherent reference, or *do* coordinate authentically (not just stylistically). Flag arXiv IDs and whether they refute the grounding/temporal story or reframe it.
(3) **Propose 2 research questions that assume the regime may have shifted:**
   - If orchestration (agent memory, multi-session state) has solved temporal discontinuity, what new *failure mode* of attention emerges—what can now be carried forward that shouldn't be?
   - If models can now perform grounding repair at scale, does the therapeutic finding (decentering predicts alliance) reverse, or do outcomes still correlate with structural moves independent of repair quality?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

What does it mean to truly attend to someone in conversation?

Sources 12 notes

Next inquiring lines