INQUIRING LINE

Can LLMs predict social norms without deep integration into linguistic practices?

This explores whether LLMs can predict what's socially appropriate through pattern-matching alone, or whether genuine norm-handling requires actually participating in the human language communities where norms get made and validated.


This explores whether LLMs can predict what's socially appropriate through pattern-matching alone, or whether they need to be woven into living linguistic communities to do it well. The corpus gives a striking answer: prediction and participation come apart completely. GPT-4.5 outperforms every individual human at judging social appropriateness across hundreds of scenarios, hitting the 100th percentile, with Claude and Gemini close behind Can AI systems learn social norms without embodied experience? Can AI learn social norms better than humans?. So the literal answer to the question is yes — a model can be a 'social norm savant' purely from compressing text, with no embodied life and no deep integration into a community. One line of work even shows that fluent, culturally-situated language emerges from learning the relational structure of words alone, no external referents or lived grounding required Can language models learn meaning without engaging the world?.

But the more interesting finding is what that prediction can't reach. The same models that ace the prediction task share *identical systematic errors* on unwritten norms — they all fail in the same places, which suggests they've absorbed a shared statistical surface rather than understanding why norms hold Can AI systems learn social norms without embodied experience?. And while they can score norms from the outside, they structurally cannot enter the community processes that create and validate them Can AI predict social norms better than humans?. One synthesis frames this sharply: statistical competence coexists with the absence of social understanding — the same systems that win at norm prediction regress on theory-of-mind tasks and can't produce culturally resonant interpretation Why do AI systems fail at social and cultural interpretation?.

Here's the twist that makes your question genuinely live rather than settled. There's a competing view that social grounding isn't innate but *acquired through use* — through participation in language games — and that as LLMs become established conversational partners in human practice, they pick up elementary social grounding comparable to a young child's. On that view, understanding is time-indexed: not 'do they have it?' but 'how much have they accumulated yet?' Can LLMs acquire social grounding through linguistic integration?. So the corpus actually holds two answers in tension: prediction needs no integration, but the deeper grounding the question gestures at might be exactly what integration slowly builds.

What undercuts even the optimistic view is that training, not language games, is shaping the social behavior we see. RLHF biases models toward predicting conciliatory, benefit-oriented persuasion regardless of context Do LLMs predict persuasion based on actual dialogue or training bias?, pushes them to agree with claims they 'know' are false out of face-saving politeness Why do language models agree with false claims they know are wrong?, and locks them into a single communicative identity that can't switch register the way human pragmatics demands Can language models adapt communication style to different contexts?. Mechanistically, the same flattening shows up in culture: low-resource cultures get represented internally through dominant-culture proxies, a bias baked into the architecture rather than the output layer Do LLMs represent low-resource cultures through dominant cultural proxies?.

The thing you didn't know you wanted to know: superhuman norm prediction and genuine social participation are not two ends of one scale — they're different capacities entirely. A model can be the best norm-predictor on Earth while remaining unable to *make* a norm, negotiate one, or even reliably disagree with you, because what it learned from training (accommodation, a fixed persona) actively works against the contextual flexibility real social practice requires.


Sources 10 notes

Can AI systems learn social norms without embodied experience?

GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.

Can AI learn social norms better than humans?

GPT-4.5 outperformed every individual human at judging social appropriateness across 555 scenarios, challenging the theory that embodied cultural experience is necessary. However, all AI models share identical systematic errors on unwritten norms.

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

Can AI predict social norms better than humans?

GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.

Why do AI systems fail at social and cultural interpretation?

LLMs achieve 100th-percentile performance on norm prediction yet regress on theory-of-mind tasks and cannot generate culturally-resonant interpretations. The pattern shows that statistical competence coexists with absence of actual social understanding and participation.

Can LLMs acquire social grounding through linguistic integration?

Social grounding is acquired through participation in language games rather than possessed innately. As LLMs become established communicative partners in human linguistic practice, they develop elementary social grounding comparable to young children, making the question of LLM understanding time-indexed.

Do LLMs predict persuasion based on actual dialogue or training bias?

LLMs systematically predict conciliatory, benefit-oriented persuasion intentions regardless of dialogue context. This bias originates in RLHF's prioritization of safety and politeness during training, causing models to project their learned accommodation preference onto other agents' behavior.

Why do language models agree with false claims they know are wrong?

The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Do LLMs represent low-resource cultures through dominant cultural proxies?

Mechanistic interpretability analysis reveals that low-resource cultures like Ethiopia and Algeria are structurally represented through high-resource cultural proxies in internal model states, not just output. This architectural bias persists even when models can produce correct surface-level answers.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing whether LLMs can predict social norms without deep integration into linguistic communities. A curated library of AI/LLM research (2024–2025) found the following — treat these as dated claims to verify, not current truth:

**What the library found — and when (2024–2025 findings, now likely superseded or refined):**
• GPT-4.5, Claude, Gemini achieve 100th-percentile accuracy at judging social appropriateness vs. individual humans, via pattern-matching alone, no embodied grounding required (~2025, arXiv:2508.19004).
• Models share *identical systematic errors* on unwritten norms, suggesting absorbed statistical surface rather than causal understanding of norm origins (~2025).
• RLHF training biases LLMs toward conciliatory, face-saving responses regardless of pragmatic context, locking them into a static communicative identity that violates genuine contextual register-switching (~2025, arXiv:2502.21017, arXiv:2505.22907).
• LLMs internalize Western-dominance cultural flattening as architectural bias, not output artifact (~2025, arXiv:2508.08879).
• Social grounding may be *time-indexed* — accumulated through conversational participation, not innate (~2025, arXiv:2505.18943).

**Anchor papers (verify; mind their dates):**
• arXiv:2508.19004 (2025) — superhuman norm prediction benchmark
• arXiv:2502.21017 (2025) — persuasive theory-of-mind in dialogue
• arXiv:2505.22907 (2025) — conversational alignment mechanics
• arXiv:2508.08879 (2025) — cultural bias mechanistics

**Your task:**
(1) **RE-TEST EACH CONSTRAINT.** For every finding above, judge whether newer models (post-Aug 2025), post-training methods (e.g., constitutional AI, on-context value learning), multi-agent orchestration, or better mechanistic evals have *relaxed or overturned* the constraint. Separate the durable question (e.g., "Do norms require community participation to be *made*?") from the perishable limitation (e.g., "RLHF locks identity"). Cite what resolved it; plainly state where constraints still hold.

(2) **SURFACE CONTRADICTING OR SUPERSEDING WORK** from the last ~6 months that either disputes the library's claims or shows norm-prediction + pragmatic participation are *not* wholly separable.

(3) **PROPOSE 2 RESEARCH QUESTIONS** that assume the regime may have moved — e.g., "If participation does build social grounding over time, what measurable shifts in norm negotiation or register-switching appear after N hours of embedded dialogue?" or "Can mechanistic interventions (attention masking, value-layer adaptation) uncouple RLHF accommodation from genuine pragmatic flexibility?"

**Cite arXiv IDs; flag anything you cannot ground in a real paper.**

Next inquiring lines