How does enactive theory define language differently than computational linguistics?
This explores how enactive cognitive science treats language as a living activity — something embodied agents *do* together — versus how computational linguistics treats it as relational structure that can be learned from text alone.
This explores how enactive theory treats language as a living activity — something embodied agents do together — versus how computational linguistics treats it as relational structure learnable from text alone. The two camps don't disagree about the same thing being more or less true; they define language as a fundamentally different kind of object. For the computational view, language is a self-contained web of relations. The cleanest statement of this in the corpus is the claim that LLMs operationalize Saussure's *langue*: they learn meaning by compressing the relational structure of text, with no external referents and no embodied grounding required to produce fluent, culturally situated discourse Can language models learn meaning without engaging the world?. On this account, language is a system whose terms are defined by their differences from one another — and that system is exactly what a next-token predictor captures.
Enactive theory rejects the idea that language *is* that system. It defines language as a form of agency, and agency has constitutive conditions a relational model can't satisfy: embodiment (a body with a stake in the world), participation (acting within a community of other agents), and precariousness (the possibility of failing, of mattering) What makes linguistic agency impossible for language models?. The key word is *constitutive* — these aren't features that make language better, they're what make it language at all. So the disagreement is categorical, not a matter of degree: no amount of training or scale moves a system from the relational-structure side to the linguistic-agency side, because the missing ingredients are architectural and existential, not informational Do LLMs gain true linguistic agency through integration?.
A useful bridge between the two definitions is the distinction between *grounding* and *agency*. LLMs can have strong functional grounding — they handle language patterns well — while lacking social grounding (participatory standing among other speakers) and causal grounding (contact with an environment through a body) What grounds language understanding in systems without embodiment?. This reframes the debate: the computational picture is essentially the claim that functional grounding is sufficient to count as language; the enactive picture insists the social and causal kinds are what language is *for*. That's why the same corpus can say an LLM gains social grounding by being woven into human language communities yet still never crosses into linguistic agency Do LLMs gain true linguistic agency through integration?.
The split shows up again at the level of what an utterance even *does*. Under the computational definition, producing text is generating strings from a probability distribution; under the enactive one, using language is addressing and relating to another agent. These share a surface form but are different operations — different in what produces the output, what it accomplishes socially, and what a listener should do with it Are language models and human speakers doing the same thing?. Neuroscience gives this a physical edge: formal linguistic competence (grammar, fluency) and functional competence (using language to think and act in the world) run on neurologically distinct systems, and next-token prediction only ever exercises the formal one Are language models developing real functional competence or just formal competence?.
What's quietly interesting is that the gap may be smaller than it first looks — and that's the part worth carrying away. Borrowing Habermas's observer/participant distinction, the corpus notes that from the *outside* humans and LLMs look categorically different, but from *within* a shared conversation both draw on the same symbolic substrate, making the difference structural rather than absolute Do humans and LLMs differ fundamentally or just superficially?. The enactive line in its strongest form ties genuine language to sharing a world through co-presence and triangulating on common objects — the same condition some argue is required even to be a *candidate* for consciousness Can disembodied language models ever qualify as conscious?. So the real fault line isn't grammar or fluency, where the machines already arrive; it's whether meaning lives in the relations between words or in the relations between agents who have something at stake.
Sources 8 notes
Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.
Enactive cognitive science identifies three constitutive properties of linguistic agency—embodiment, participation, and precariousness—that are structurally absent from LLMs. This is a categorical incompatibility, not a matter of degree, suggesting current architectures cannot achieve genuine linguistic agency.
Social grounding and linguistic agency are distinct properties. LLMs acquire more social grounding through integration into language communities, but remain categorically incapable of linguistic agency in the enactive sense, which requires embodiment and precariousness no amount of use can provide.
Language models achieve functional grounding through relational language patterns but lack social grounding through participatory agency and causal grounding through embodied environmental contact. Social grounding can increase through human integration, but linguistic agency requires architectural changes beyond training.
LLMs produce strings via probability distributions; humans use language to address and relate to others. They share surface form but differ in what produces output, what it does socially, and what receivers should do with it.
Neuroscience evidence shows next-token prediction produces formal linguistic competence but not functional competence, because functional understanding requires integration of diverse brain networks beyond language circuits that the prediction objective never activates.
Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.
Current disembodied LLMs cannot be candidates for consciousness because consciousness language originates from and applies only to entities sharing a world with us through co-presence and triangulation on shared objects.