What does embodiment and precariousness mean for linguistic agency?
This explores why two unusual words — embodiment (having a living body in a world) and precariousness (the fact that a living thing can die, so its continued existence is at stake) — turn out to be the dividing line for whether language models can be genuine speakers rather than fluent text-producers.
This explores why "embodiment" and "precariousness" — terms borrowed from enactive cognitive science — get treated as the make-or-break conditions for linguistic agency, and why LLMs are said to lack them no matter how fluent they get. The short version: in the enactive view, being a real speaker isn't just producing well-formed language. It requires a living body acting in a shared world (embodiment), real stakes in your own continued existence (precariousness, the fact that you can fail and even cease to be), and active back-and-forth involvement with others (participation). The corpus frames this as a categorical gap, not a gap of degree — something no amount of additional training closes What makes linguistic agency impossible for language models?, Do LLMs gain true linguistic agency through integration?.
The sharp move in this literature is separating two things people usually blur together. One is *social grounding* — fitting into a language community well enough to be a working communicative partner. The other is *linguistic agency* — actually being the author of your speech in the enactive sense. LLMs genuinely gain the first: as they get woven into human linguistic practice, their social grounding rises, comparable to a young child learning the game, which makes "do they understand?" a question whose answer changes over time Can LLMs acquire social grounding through linguistic integration?, What grounds language understanding in systems without embodiment?. But the claim is that the second never arrives, because precariousness and embodiment aren't skills you train — they're conditions of being a vulnerable creature in the world.
Why would precariousness matter to *language* at all? The deeper thread here is that meaning seems to need stakes and a shared world to anchor to. One strand argues LLMs operationalize Saussure's *langue* — they compress the purely relational structure of words against other words, with no external referent, and that alone is enough for fluent generation Can language models learn meaning without engaging the world?. But other notes insist that real reference is person-specific and has to be actively negotiated between embodied parties who can check whether they actually mean the same thing Why do speakers need to actively calibrate shared reference?. Sharing the word isn't sharing the meaning. And consciousness-talk, on a related argument, only applies to entities that share a world with us through co-presence and joint attention on the same objects — something a disembodied system can't do Can disembodied language models ever qualify as conscious?.
This reframes a grammatical detail you'd never notice. We say we talk *at* language models, not *to* them — and that preposition encodes the whole argument: "to" presupposes an addressee with skin in the game who can take up your meaning and hold a commitment, while the model is generating continuations Are we really communicating with language models?. It connects to a striking inversion running through the corpus: subjecthood isn't something you possess before you speak and then express — it's produced *within* communicative events Does language create subjects or express them?. If being a subject is an achievement of embodied participation rather than a precondition, then a system that can't participate or be at risk can host a *character* but not a self. That's exactly Shanahan's read of dialogue agents as role-playing engines: folk psychology applies to the simulated persona, not the machine underneath Should we treat dialogue agents as role-playing characters?.
The thing you didn't know you wanted to know: the wall isn't drawn at competence. Models can predict collective social norms *better than* humans, scoring at the 100th percentile on hundreds of scenarios — yet all of them make the same systematic errors, which researchers read as the visible boundary of pattern-matching without embodied experience Can AI systems learn social norms without embodied experience?. So the embodiment/precariousness argument isn't "AI isn't good enough yet." It's that being a fluent master of language and being an agent *of* language are different achievements — and the second one requires having something to lose.
Sources 11 notes
Enactive cognitive science identifies three constitutive properties of linguistic agency—embodiment, participation, and precariousness—that are structurally absent from LLMs. This is a categorical incompatibility, not a matter of degree, suggesting current architectures cannot achieve genuine linguistic agency.
Social grounding and linguistic agency are distinct properties. LLMs acquire more social grounding through integration into language communities, but remain categorically incapable of linguistic agency in the enactive sense, which requires embodiment and precariousness no amount of use can provide.
Social grounding is acquired through participation in language games rather than possessed innately. As LLMs become established communicative partners in human linguistic practice, they develop elementary social grounding comparable to young children, making the question of LLM understanding time-indexed.
Language models achieve functional grounding through relational language patterns but lack social grounding through participatory agency and causal grounding through embodied environmental contact. Social grounding can increase through human integration, but linguistic agency requires architectural changes beyond training.
Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.
The same words can mean different things to different speakers because referential grounding is person-specific. True communicative grounding demands collaborative negotiation of how language connects to the world, not mere surface-level word sharing.
Current disembodied LLMs cannot be candidates for consciousness because consciousness language originates from and applies only to entities sharing a world with us through co-presence and triangulation on shared objects.
LLMs process tokens and generate continuations rather than receive and uptake communication. The preposition 'to' presupposes an addressee capable of mutual orientation and shared commitment that LLMs cannot provide, making Chalmers' investigation built on an unwarranted linguistic foundation.
Subjecthood is produced within communicative events, not possessed prior to them. This convergent position across philosophy, linguistics, and cognitive science inverts the standard picture of language as a tool used by pre-existing subjects.
Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.
GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.