Where is the speaker when AI produces speech?

Prior forms of orality—from face-to-face speech to broadcast media—always had an embodied speaker anchoring the utterance. Does AI speech without a speaker represent a fundamentally new media condition, and what happens to our frameworks for evaluating it?

Synthesis note · 2026-04-14

Primary orality (Ong) is speech in face-to-face cultures — embodied speakers performing knowledge in real time. Secondary orality is speech mediated by electronic media (radio, television) — embodied speakers whose presence is technologically extended but still anchored in actual speaking persons. Both forms preserve the speaker as the carrier of the speech. The voice is the voice of someone.

AI orality breaks this. The output exhibits the oral form — performative, additive, situational, conversational — but no speaker is producing it. There is no body whose throat shapes the words, no mind selecting the next phrase, no person whose history of past speech anchors the present utterance. The output sounds like speech in the sense that it has the rhythmic and pragmatic surface of speech, but it comes from nowhere.

This is structurally novel in media history. Prior media theory categorized media by their relation to embodied speakers — orality (direct embodiment), writing (deferred from embodiment but anchored to a prior writer), print (mass-distributed but author-anchored), broadcast (technologically extended but speaker-anchored). AI is the first form where the speech-shape persists without any speaker-anchor. There is no prior conceptual category for it.

The consequences run through the rest of the framework. Does AI-generated content mirror oral culture's knowledge patterns? picks up the form-side; this picks up the carrier-side. The oral form returns; the carrier the form depended on does not. Why doesn't AI output carry the spirit of a giver? makes the same point about gift-flow: the flow returns, the carrier-anchor does not.

The diagnostic implication is that frameworks for evaluating speech (rhetoric, persuasion theory, ethos/pathos/logos) all presuppose a speaker. They calibrate audience trust to speaker properties: credibility, prior commitments, demonstrated expertise. With no speaker to bear these properties, the frameworks misfire. Audiences either project a phantom speaker (treating the AI as if it were a person) or accept the speech without the speaker-evaluation step (When do users stop checking whether AI output is actually backed?). Neither response is a competent reading of disembodied orality, because no competent reading of disembodied orality has yet been developed.

Inquiring lines that use this note as a source 7

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 112 in 2-hop network ·medium cluster Open in graph ↗

Where is the speaker when AI produces speech? Does AI-generated content mirror oral culture's kn… Why doesn't AI output carry the spirit of a giver? When do users stop checking whether AI output is a…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does AI-generated content mirror oral culture's knowledge patterns? Walter Ong's framework for oral versus literate cultures may describe how AI content functions on social media. Understanding this parallel could explain why AI discourse feels fundamentally different from print-era knowledge.
companion claim about the form-side of AI orality
Why doesn't AI output carry the spirit of a giver? Does AI-generated output function like a gift in Mauss's sense, where the giver's spirit obligates the receiver? This explores whether statistical residue can replace the moral weight of personal obligation.
same carrier-absence pattern in the gift-economy frame
When do users stop checking whether AI output is actually backed? What causes users to accept AI-generated content at face value without verifying its basis? Understanding this receiver-side acceptance reveals how intelligence-token systems maintain value despite lacking real backing.
one of the two failed receiver-side responses to disembodied orality

Where is the speaker when AI produces speech?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4