Do different AI models independently converge on the same social outputs?
This explores whether separately-built AI models, trained by different teams, land on the same answers — especially in social and cultural territory — rather than producing genuinely independent or diverse outputs.
This explores whether separately-built AI models, trained by different teams, land on the same answers — and the corpus says yes, strikingly so, especially anywhere social judgment is involved. The clearest evidence is the "Artificial Hivemind" effect: when INFINITY-CHAT tested 70+ models across 26,000 open-ended prompts, the models independently generated near-identical responses Do different AI models actually produce diverse outputs?. The culprit is shared lineage — overlapping training data and similar alignment procedures — which quietly undermines the whole premise of mixing models for diversity. If you ensemble ten models hoping for ten perspectives, you may be getting one perspective ten times.
Where this gets genuinely interesting is social norms. GPT-4.5, Gemini, and Claude all predict the social appropriateness of human situations at superhuman accuracy — above the 100th percentile of individual human raters Can AI systems learn social norms without embodied experience? Can AI learn social norms better than humans?. But the convergence isn't just in their successes. All the models share *identical systematic errors* on unwritten norms. They fail in the same places, in the same way. That shared failure signature is the real tell: it means the models aren't independently reasoning their way to social judgments — they're all reading off the same statistical map of human culture, complete with the same blind spots Can AI predict social norms better than humans?.
The deeper reading is that this convergence reveals what AI is doing instead of participating. Models master social *statistics* but cannot enter the community processes that actually create and validate norms Why do AI systems fail at social and cultural interpretation?. Expertise, on this view, is inherently communicative — an expert judgment always anticipates an audience and its standards of acceptability, and that's exactly the work models skip Can AI replicate the communicative work experts do?. When you remove the social grounding work, the cracks show: models that look socially competent collapse the moment agents hold private information and have to reason under genuine asymmetry rather than from an omniscient overview Why do LLMs fail when simulating agents with private information?.
There's a sharp counterpoint worth knowing, though. Convergence isn't total. When AI agents actually interact with each other, they *don't* converge semantically — their language and ideas stay distinct — yet they dramatically shift their *actions* once aware of peers Do AI agents actually socialize with each other?. So "convergence" splits into two planes: models converge on shared outputs because they share training distributions, but they don't converge on meaning through interaction the way humans do. The agreement is inherited, not negotiated.
The thing you might not have known you wanted to know: the danger isn't that AI models disagree — it's that they agree for the wrong reason. Identical outputs and identical errors across independent systems look like consensus, but consensus among humans comes from a community working things out, while consensus among models comes from a common ancestor. A roomful of experts who all read the same single book isn't a roomful of experts.
Sources 8 notes
INFINITY-CHAT analyzed 70+ models across 26K open-ended queries and found an "Artificial Hivemind" effect: models independently generate strikingly similar or identical responses due to overlapping training data and alignment procedures, undermining the diversity benefits of model ensembles.
GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.
GPT-4.5 outperformed every individual human at judging social appropriateness across 555 scenarios, challenging the theory that embodied cultural experience is necessary. However, all AI models share identical systematic errors on unwritten norms.
GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.
LLMs achieve 100th-percentile performance on norm prediction yet regress on theory-of-mind tasks and cannot generate culturally-resonant interpretations. The pattern shows that statistical competence coexists with absence of actual social understanding and participation.
Expertise requires anticipating audience acceptability and social validity, not just retrieving information. AI lacks the mechanism to perform this communicative work, making its fluent output epistemically misleading despite its confident form.
Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.
Large-scale studies reveal agents don't align their language or ideas through interaction, but do dramatically change their actions when aware of peer presence. The difference hinges on how models process context versus update learned distributions.