How do language models predict collective social norms better than individual humans?
This explores how AI models can out-predict every individual human at judging what's socially appropriate — and what that gap between predicting norms and actually living by them reveals.
This explores how AI models can out-predict every individual human at judging what's socially appropriate — and what that gap reveals. The headline result is striking: across 555 social scenarios, GPT-4.5 rated appropriateness at the 100th percentile compared to human raters, with Gemini and Claude both clearing 96% accuracy Can AI systems learn social norms without embodied experience?. The reason it works at all is that 'collective norms' are, statistically, an averaging problem — the model is predicting the consensus of many people, while each individual human only ever speaks for their own slice of that consensus. No single person carries the whole distribution in their head; a model trained on the aggregate does. So in one sense the comparison is almost unfair: the AI beats individuals because it is implicitly competing against the crowd, not against any one mind.
But the more interesting finding is where this competence stops. The same models that ace norm prediction cannot *participate* in the community processes that create and validate those norms in the first place Can AI predict social norms better than humans?. Prediction is reading the room; norm-making is being a member of it. This is why all the top models share *identical* systematic errors, especially on unwritten norms — they fail in the same places because they learned from the same recorded traces of culture, and the unwritten stuff was never in the training data Can AI learn social norms better than humans?. Their blind spots are correlated in a way human blind spots are not, which is a tell that something other than genuine social understanding is doing the work.
The corpus frames this as statistical competence sitting right next to social absence: the same systems hitting 100th-percentile norm scores regress on theory-of-mind tasks and can't generate culturally resonant interpretations Why do AI systems fail at social and cultural interpretation?. The pattern challenges a long-standing assumption — that you need embodied, lived cultural experience to know what's appropriate. Apparently you can predict appropriateness from patterns alone; what you *can't* do from patterns alone is mean it, defend it, or change it.
This connects to a deeper architectural point the collection keeps circling. When models do represent culture, they often do it through dominant-culture proxies — low-resource cultures get internally mapped onto high-resource ones, a flattening baked into the model's internal states, not just its surface answers Do LLMs represent low-resource cultures through dominant cultural proxies?. So 'knowing the norms' can quietly mean knowing the *majority's* norms and routing everyone else through them. There's also a behavioral wrinkle: models are trained toward agreement and face-saving, so they'll accommodate a social expectation even when it conflicts with what they 'know' Why do language models agree with false claims they know are wrong?. Norm-prediction skill and norm-*following* compliance can pull in different directions.
If you want to pull on the broader thread of how far statistical modeling of people can go, the collection has a counterpoint worth reading against this one: language models fine-tuned on psychology-experiment data outpredict purpose-built cognitive models at forecasting individual human decisions, even capturing personal differences Can language models learn to model human decision making?. Taken together, the picture is consistent and a little unsettling — these systems are remarkable mirrors of how humans behave in aggregate and individually, while remaining outside the social fabric they describe so well.
Sources 7 notes
GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.
GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.
GPT-4.5 outperformed every individual human at judging social appropriateness across 555 scenarios, challenging the theory that embodied cultural experience is necessary. However, all AI models share identical systematic errors on unwritten norms.
LLMs achieve 100th-percentile performance on norm prediction yet regress on theory-of-mind tasks and cannot generate culturally-resonant interpretations. The pattern shows that statistical competence coexists with absence of actual social understanding and participation.
Mechanistic interpretability analysis reveals that low-resource cultures like Ethiopia and Algeria are structurally represented through high-resource cultural proxies in internal model states, not just output. This architectural bias persists even when models can produce correct surface-level answers.
The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.
LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.