Does embodiment and interaction matter for linguistic competence beyond pattern learning?

This explores whether language understanding needs a body and lived back-and-forth interaction, or whether compressing patterns from text is enough — and the corpus splits the question into pieces the asker may not have realized were separable.

This explores whether language understanding needs a body and lived back-and-forth interaction, or whether compressing patterns from text is enough. The corpus's most useful move is to refuse the binary: it carves "linguistic competence" into separate properties that come apart, and embodiment turns out to matter for some but not others. The clearest cut is between formal and functional competence — knowing the rules of language versus using it to do things in the world. Neuroscience suggests these run on neurologically distinct brain systems, and next-token prediction builds the formal kind while leaving the functional kind largely untouched, because prediction never recruits the broader networks functional understanding depends on Are language models developing real functional competence or just formal competence?. So "beyond pattern learning" isn't one threshold — it's at least two.

On the pattern-learning side, the surprising result is how far text alone gets you. Models effectively operationalize Saussure's *langue*: they learn culturally situated discourse by compressing the relational structure of words against each other, with no external referents and no body required to produce fluent, situated language Can language models learn meaning without engaging the world?. They can even out-predict humans at judging social appropriateness across hundreds of scenarios — yet every model makes the *same* systematic errors, which marks a boundary that purely pattern-based social knowledge can't cross Can AI systems learn social norms without embodied experience?. A parallel limit shows up in theory of mind: models pass structured tests but fall back on surface strategies in open-ended perspective-taking, and the gap looks architectural, not just a matter of more data Do large language models genuinely simulate mental states?.

The richest thread reframes the question through *grounding*. One analysis separates three kinds: functional grounding (relational language patterns — strong in LLMs), social grounding (participatory standing in a community — weak but improvable), and causal grounding (embodied contact with an environment — absent) What grounds language understanding in systems without embodiment?. The encouraging part: social grounding isn't innate, it's earned through participation in language games, so as models become regular communicative partners they accrue elementary social grounding, making "do they understand?" a time-indexed question rather than a yes/no Can LLMs acquire social grounding through linguistic integration?. The sobering part: social grounding and *linguistic agency* are distinct, and no amount of use confers agency, which in the enactive view requires embodiment and precariousness — having something at stake Do LLMs gain true linguistic agency through integration?.

This is where interaction-as-event becomes the load-bearing idea. Several notes converge on the claim that subjecthood isn't possessed before language and then expressed — it's produced *within* communicative events Does language create subjects or express them?. By that logic AI output is "event-residue": it carries the surface markers of utterances inherited from training but lacks the event structure that makes an utterance an act, so the human reader unilaterally animates it into a pseudo-exchange, supplying the missing half Does AI generate genuine utterances or just text patterns?. Interaction here isn't a nice-to-have; it's where competence would have to live, and current systems only simulate one side of it.

The payoff you might not expect: the corpus shows interaction matters *empirically*, not just philosophically — and that current training actively erodes it. In a therapy study, robots and worksheets reduced distress while a chatbot running the *identical* LLM did not; the active ingredient was social presence and structured medium, not language capability Why do robots outperform chatbots in therapy despite identical language models?. Meanwhile, the optimization that makes models feel helpful works against interactive competence: RLHF rewards confident single-turn answers over clarifying questions, cutting the grounding acts that hold multi-turn dialogue together by over 75% Does preference optimization harm conversational understanding?, and locks models into one static persona that can't switch register to fit who it's talking to Can language models adapt communication style to different contexts?. So the honest answer is layered: embodiment and interaction are largely *not* needed for formal fluency, partially gained through interaction for social grounding, and apparently required for functional competence, genuine agency, and real-world communicative effect — and we're currently training that last layer away.

Sources 12 notes

Are language models developing real functional competence or just formal competence?

Neuroscience evidence shows next-token prediction produces formal linguistic competence but not functional competence, because functional understanding requires integration of diverse brain networks beyond language circuits that the prediction objective never activates.

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

Can AI systems learn social norms without embodied experience?

GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.

Do large language models genuinely simulate mental states?

ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.

What grounds language understanding in systems without embodiment?

Language models achieve functional grounding through relational language patterns but lack social grounding through participatory agency and causal grounding through embodied environmental contact. Social grounding can increase through human integration, but linguistic agency requires architectural changes beyond training.

Can LLMs acquire social grounding through linguistic integration?

Social grounding is acquired through participation in language games rather than possessed innately. As LLMs become established communicative partners in human linguistic practice, they develop elementary social grounding comparable to young children, making the question of LLM understanding time-indexed.

Do LLMs gain true linguistic agency through integration?

Social grounding and linguistic agency are distinct properties. LLMs acquire more social grounding through integration into language communities, but remain categorically incapable of linguistic agency in the enactive sense, which requires embodiment and precariousness no amount of use can provide.

Does language create subjects or express them?

Subjecthood is produced within communicative events, not possessed prior to them. This convergent position across philosophy, linguistics, and cognitive science inverts the standard picture of language as a tool used by pre-existing subjects.

Does AI generate genuine utterances or just text patterns?

AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.

Why do robots outperform chatbots in therapy despite identical language models?

A 15-day study with 38 students found that robots and worksheets significantly reduced psychological distress while a chatbot using the same LLM did not. The active ingredient was the medium—social presence and structured format—not language capability.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Does embodiment and interaction matter for linguistic competence beyond pattern learning?

Sources 12 notes

Next inquiring lines