How do audiences evaluate speech when there is no speaker to assess?
This explores what happens to the normal listener's habit of judging a message by judging its speaker — once AI produces speech that has no embodied person behind it to hold accountable.
This explores what happens to the normal listener's habit of judging a message by judging its speaker — once AI produces speech with no embodied person behind it. The corpus's starting point is that this is a genuinely new situation: AI orality has all the formal feel of speech — performative, conversational, additive — but breaks the historical rule that every utterance, even recorded or broadcast ones, traces back to a carrier-person you can locate and assess Where is the speaker when AI produces speech?. So the question isn't rhetorical. Audiences really do lose the anchor they've always used.
What several notes suggest is that listeners may never have been evaluating the speaker as directly as we assume — they were evaluating themselves. In debate corpora, who you already are predicts whether you're persuaded far better than anything about the language used; once a reader's political and religious priors are controlled for, the supposedly persuasive features of the text largely evaporate Does what readers believe matter more than what debaters say? Do linguistic features of persuasion stay the same across audiences?. That reframes the puzzle: if much of "evaluation" was always the audience matching speech against its own beliefs, then a missing speaker matters less than you'd think for whether the message lands — and it explains why AI can sway people effectively even though it can't reliably judge the very arguments it makes Can LLMs persuade without actually understanding arguments?.
But landing is not the same as warranting trust, and here the corpus pushes back hard. A system can produce contextually perfect speech and still lack the thing that makes a speaker assessable — accountability, an evaluative stance, the relational conditions of being a communicative subject. Tests that pass any fluent text-producer are calibrated to the wrong phenomenon; they detect speech-shaped output, not a someone behind it Does behavioral speech output prove communicative subjecthood?. This connects to a deeper claim: subjecthood isn't possessed before language and then expressed through it — it's produced inside the communicative event itself Does language create subjects or express them?. On that view a "speaker" is a role the exchange conjures, which is exactly why disembodied AI speech can feel like it has one even when no person is there.
The practical lever, then, isn't finding the missing speaker but doing the grounding work that listeners normally offload onto them. Meaning was never carried by words alone — the same words mean different things to different people, and real understanding takes active, collaborative calibration of shared reference Why do speakers need to actively calibrate shared reference?. Audiences also lean on perceived personality cues, but those are unreliable readouts: the acoustic signals that read as confident extraversion in a calm setting flip to signaling neuroticism under stress, so even "who is this" judgments are context artifacts, not stable speaker facts Does personality sound the same in stressful and neutral conversations?.
The thing you may not have known you wanted to know: the absence of a speaker doesn't break evaluation so much as expose how much of it was always being done by the audience. The competent move with AI voices is to stop asking "who said this and can I trust them" and start doing explicitly what a trustworthy speaker used to let us skip — checking the claim against shared reference, watching our own priors, and noticing that fluency is not the same as a position someone can be held to.
Sources 8 notes
AI produces utterances with the formal properties of speech—performative, additive, conversational—but no embodied speaker generates or anchors them. This breaks the historical pattern where all prior orality, primary and secondary, depended on a carrier-person, making AI structurally novel in media history.
Analysis of debate corpora shows that political and religious ideology labels of voters outpredict linguistic features when modeling debate outcomes. Language effects observed without reader controls are confounded by audience composition correlated with debate topics.
The linguistic features that predict persuasion success change dramatically once political and religious ideology are added as statistical controls. Features appearing predictive in standard analyses often reflect audience-text matching rather than true language effects, making many published findings potentially artifacts of audience composition.
The Thin Line study shows LLMs sway debate participants and audiences but cannot reliably evaluate those same debates, with inter-annotator agreement ranging from near-zero to 0.6. Persuasive competence and pragmatic comprehension are separable capabilities.
Chalmers' test passes any system producing contextually appropriate text, but communicative subjecthood requires relational-normative conditions like accountability and evaluative stance. The test is calibrated to the wrong phenomenon, creating false positives like puppets that walk-shaped without walking.
Subjecthood is produced within communicative events, not possessed prior to them. This convergent position across philosophy, linguistics, and cognitive science inverts the standard picture of language as a tool used by pre-existing subjects.
The same words can mean different things to different speakers because referential grounding is person-specific. True communicative grounding demands collaborative negotiation of how language connects to the world, not mere surface-level word sharing.
Acoustic features that signal extraversion in neutral interviews instead predict neuroticism under stress. Handcrafted acoustic features outperform neural embeddings, suggesting personality is conveyed through specific measurable behaviors rather than holistic speaker style.