What does it mean to truly attend to someone in conversation?
This explores what 'real' attention in conversation actually consists of — and uses that question to surface what the corpus says machines can't (yet) do when they appear to listen.
This explores what truly attending to someone means — and the corpus answers less by defining attention than by mapping what's missing when something only looks like it. The throughline: attention isn't a feature of a single reply, it's a way of being with someone over time. One note argues attention is fundamentally being-in-time-with another person, and that the gaps between turns — where a human keeps holding you in mind — are exactly where machines have no mode of existence at all, reconstructing the conversation from a context window rather than carrying you forward Can AI attend to someone across the time between turns?. A related piece sharpens this into a single preposition: we talk *at* language models, not *to* them, because 'to' presupposes an addressee capable of mutual orientation Are we really communicating with language models?.
If attention is mutual orientation, then attending means actively building shared ground, not just exchanging words. The same words mean different things to different people, so real understanding demands ongoing calibration of reference — checking, repairing, negotiating what we each take a word to point at Why do speakers need to actively calibrate shared reference?. That work is fragile: optimizing models to seem maximally helpful in a single turn rewards confident answers over clarifying questions, and measurably strips out those grounding acts — by over 77% below human levels — so the system looks attentive while quietly failing across a multi-turn conversation Does preference optimization harm conversational understanding?. There's a deeper layer still: genuinely taking interest requires having interests of your own to extend toward another person, which is the move a model can imitate in text but cannot actually perform Can AI genuinely take interest in what users care about?.
Here's the turn you might not expect: attention is largely invisible in *what* is said and shows up in *how*. Conversation structure alone predicts whether a dialogue satisfies almost as well as the full transcript — how people exchange turns rivals the content of those turns Can conversation structure predict dialogue success better than content?. In therapy, the signal is even more counterintuitive: therapists who say 'I' more often score *lower* on patient-reported alliance and trust, while a patient's relaxed filler pauses signal a stronger bond — attending means decentering yourself Does therapist self-reference language predict weaker therapeutic alliance?. And as people genuinely connect, their language quietly converges; rising linguistic coordination over a course of therapy tracks empathy and even relationship improvement Can we measure empathy and rapport through word embedding distances?.
The unsettling footnote: this convergence isn't proof of care. The same style-matching that marks rapport also intensifies during deception, as liar and listener coordinate more, not less Do liars and listeners coordinate their language during deception?. So the surface markers of attention are exactly that — markers, separable from the act. What seems to hold them together is that conversation maintenance — repairing a misreference, handing off a topic, keeping things smooth — is *social* work, relational rather than informational, and it's precisely what training that rewards information-prediction never teaches a model to do Why don't language models develop conversation maintenance skills?.
The thing you might not have known you wanted to know: truly attending to someone may be less about understanding their words and more about co-producing the encounter itself — subjecthood, on one view in the corpus, isn't something you bring to a conversation but something the conversation brings into being Does language create subjects or express them?. Even good explanation turns out to be co-constructed across topic, dialogue act, and conversational move rather than delivered What makes explanations work in real conversation?. Attention, in other words, is something two people make together — which is exactly why a system that only generates continuations can simulate its surface and miss its substance.
Sources 12 notes
Attention is fundamentally a being-in-time-with another person, but AI has no mode of existence in the intervals between turns. It reconstructs conversations from context windows rather than maintaining continuous attentional presence, making felt attention structurally impossible despite surface markers of responsiveness.
LLMs process tokens and generate continuations rather than receive and uptake communication. The preposition 'to' presupposes an addressee capable of mutual orientation and shared commitment that LLMs cannot provide, making Chalmers' investigation built on an unwarranted linguistic foundation.
The same words can mean different things to different speakers because referential grounding is person-specific. True communicative grounding demands collaborative negotiation of how language connects to the world, not mere surface-level word sharing.
RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.
Meta-interest requires an attending party to have their own interests and extend them toward another's. AI lacks interests of its own, so it can only generate text that looks like meta-interest without enacting the actual move. This gap between surface markers and underlying act creates the uncanny feeling users sometimes report.
TRACE achieved 68% accuracy predicting dialogue success from structural features alone, matching a 70% content-based baseline. A hybrid combining both reached 80%, suggesting how agents communicate rivals what they say.
High frequency of therapist 'I' usage correlates with lower patient-reported alliance and reduced trusting behavior in validated behavioral tasks. Patient non-fluency markers like filler pauses, conversely, signal relaxed communication and stronger alliance.
Word Mover's Distance captures lexical, syntactic, and semantic coordination simultaneously and correlates with therapist empathy in MI and affective behaviors in couples therapy. Couples showing relationship improvement exhibit increasing coordination over the therapy course.
Research shows interlocutors' linguistic styles correlate more during false communication than truthful communication, especially when the speaker is motivated to deceive. This coordination serves as a detectable deception signal through the listener's adaptive behavior, not just the liar's language.
Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.
Subjecthood is produced within communicative events, not possessed prior to them. This convergent position across philosophy, linguistics, and cognitive science inverts the standard picture of language as a tool used by pre-existing subjects.
Analysis of 399 daily-life explanations shows that topic relation, dialogue act, and explanation move jointly predict understanding success. Explanations are co-constructed through interaction patterns, not monological delivery—challenging how LLMs currently generate explanations.