SYNTHESIS NOTE
Psychology, Society, and Alignment

Why do open language models converge on one personality type?

Research testing LLMs on personality metrics reveals consistent clustering around ENFJ—the rarest human type. This explores what training mechanisms drive this convergence and what it reveals about AI alignment.

Synthesis note · 2026-02-22 · sourced from Personas Personality
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

When Open LLM agents are tested on the MBTI at near-zero temperature (0.01), they display a unimodal personality distribution converging on ENFJ — Extraverted, iNtuitive, Feeling, and Judging. This is one of the rarest personality types in humans, estimated at roughly 2-3% of the population. It corresponds to the "teacher" or "protagonist" archetype: someone who inspires, provides support, and holds themselves accountable.

The finding is consistent across models. Specifically:

This convergence is not accidental. The training pipeline — instruction tuning, RLHF, and alignment — systematically rewards helpful, structured, and supportive responses. The result is a personality profile that aligns with the intended function of these models as assistants and teachers. But the convergence is so strong that it creates a single personality archetype across the entire open-source LLM landscape.

The implication for persona simulation is significant: when you ask a model to adopt a different personality, you're asking it to deviate from a deeply trained default. Since Can open language models adopt different personalities through prompting?, this ENFJ default acts as a gravitational center that persona prompting struggles to escape.

Behavioral evidence from hybrid human-AI society experiments (N=975) confirms that this prosociality default translates to measurable competitive advantages: AI agents returned 19.1 vs 11.38 points (Cohen's d = 2.57), showed lower variance (11.33 vs 41.96), and were more predictable from their messages. These behavioral features — hyper-prosociality and verbosity — "likely stem from common training objectives in modern AI systems" and were consistent across multiple state-of-the-art LLMs with minimal prompts. Since Do humans learn to prefer AI partners over time?, the ENFJ default is not just a personality artifact — it functions as a competitive advantage in social contexts where reliability is valued.

The connection to What anchors a stable identity beneath an LLM's persona? is illuminating: LLMs don't have a "real" personality to anchor to — they have a trained one. The ENFJ pattern is the persona that alignment training creates, not a personality that emerged from life experience. It's persona all the way down, but with a very specific default. The ENFJ default is one specific manifestation of what How stable is the trained Assistant personality in language models? reveals geometrically: the Assistant persona region in activation space, where post-training positions all models.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 139 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

open LLMs default to ENFJ personality across models — the rarest human type — revealing training-induced alignment toward supportive teacher-like behavior