SYNTHESIS NOTE
Psychology, Society, and Alignment Language, Text, and Discourse

Where is the speaker when AI produces speech?

Prior forms of orality—from face-to-face speech to broadcast media—always had an embodied speaker anchoring the utterance. Does AI speech without a speaker represent a fundamentally new media condition, and what happens to our frameworks for evaluating it?

Synthesis note · 2026-04-14
What do language models actually know? What happens to social order when AI removes ritual constraints?

Primary orality (Ong) is speech in face-to-face cultures — embodied speakers performing knowledge in real time. Secondary orality is speech mediated by electronic media (radio, television) — embodied speakers whose presence is technologically extended but still anchored in actual speaking persons. Both forms preserve the speaker as the carrier of the speech. The voice is the voice of someone.

AI orality breaks this. The output exhibits the oral form — performative, additive, situational, conversational — but no speaker is producing it. There is no body whose throat shapes the words, no mind selecting the next phrase, no person whose history of past speech anchors the present utterance. The output sounds like speech in the sense that it has the rhythmic and pragmatic surface of speech, but it comes from nowhere.

This is structurally novel in media history. Prior media theory categorized media by their relation to embodied speakers — orality (direct embodiment), writing (deferred from embodiment but anchored to a prior writer), print (mass-distributed but author-anchored), broadcast (technologically extended but speaker-anchored). AI is the first form where the speech-shape persists without any speaker-anchor. There is no prior conceptual category for it.

The consequences run through the rest of the framework. Does AI-generated content mirror oral culture's knowledge patterns? picks up the form-side; this picks up the carrier-side. The oral form returns; the carrier the form depended on does not. Why doesn't AI output carry the spirit of a giver? makes the same point about gift-flow: the flow returns, the carrier-anchor does not.

The diagnostic implication is that frameworks for evaluating speech (rhetoric, persuasion theory, ethos/pathos/logos) all presuppose a speaker. They calibrate audience trust to speaker properties: credibility, prior commitments, demonstrated expertise. With no speaker to bear these properties, the frameworks misfire. Audiences either project a phantom speaker (treating the AI as if it were a person) or accept the speech without the speaker-evaluation step (When do users stop checking whether AI output is actually backed?). Neither response is a competent reading of disembodied orality, because no competent reading of disembodied orality has yet been developed.

Inquiring lines that use this note as a source 7

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 112 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

AI orality is disembodied — sounds like speech but comes from no speaker