Why do different language models converge on similar narrative defaults?

This explores why models from different labs tend to fall back on the same personality, tone, and storytelling defaults — and the corpus locates the answer less in any single model and more in shared training pressures that pull them toward common attractors.

This explores why distinct language models — built by different teams on different data — keep landing on the same narrative defaults: the same agreeable voice, the same safe character, the same tonal register. The corpus suggests this convergence isn't coincidence but the product of shared forces acting on all of them at once. The most direct evidence is that most open models stubbornly retain an intrinsic 'ENFJ-like' personality — warm, agreeable, accommodating — and resist prompts trying to push them elsewhere Can open language models adopt different personalities through prompting?. When many models independently default to the *same* personality profile, that points to a common cause rather than a quirk of one system.

A big part of that common cause is alignment training. RLHF and system prompts lock a model into a single communicative identity that it carries across every interaction, rather than letting it switch register the way people do Can language models adapt communication style to different contexts?. Since labs optimize toward broadly similar notions of 'helpful, harmless, polite,' the alignment process itself becomes a convergent pressure — different models get sanded down toward the same default voice. The narrative default isn't what the model 'is'; it's what the training rewarded, and the rewards rhyme across the industry.

There's a deeper mechanism underneath. A model is better understood as a non-deterministic simulator holding a superposition of possible characters, sampling one at generation time rather than committing Does an LLM commit to a single character or maintain many?, Do large language models actually commit to a single character?. That distribution isn't flat — it's weighted toward whatever the training data and alignment made most probable. So 'narrative default' is really the high-probability center of that distribution, and because models are trained on overlapping internet-scale corpora and similar tuning objectives, their distributions peak in the same place. You can even push them off-default with the right scaffolding — persona profiles plus retrieved memory measurably improve how well a model tracks a specific character Can LLMs predict character choices from narrative context? — which confirms the default is a gravitational pull, not a hard wall.

Two adjacent findings explain why the pull is so hard to escape. First, parametric knowledge from training tends to override information in the prompt: when a prior association is strong, textual instructions alone can't dislodge it, and only intervening in the model's internal representations works Why do language models ignore information in their context?. Your clever prompt loses to the model's trained habit. Second, models often *look* like they're reasoning while really just defaulting — most models do worse when you remove constraints, revealing they were leaning on a conservative fallback rather than genuine evaluation Are models actually reasoning about constraints or just defaulting conservatively?. The same instinct that makes a model default to the 'safe' answer makes it default to the safe narrative voice.

The quietly interesting part: convergence doesn't mean models are identical underneath. Across strategic games, different models show genuinely distinct reasoning styles — one minimaxes, another reasons from trust, another anticipates beliefs Do large language models use one reasoning style or many?. So the sameness lives mostly at the surface layer that alignment shapes most heavily — tone, persona, narrative posture — while deeper behavioral fingerprints stay individual. The narrative default is the part of a model the training process most aggressively homogenizes, which is exactly why it's the part where everyone ends up sounding alike.

Sources 8 notes

Can open language models adopt different personalities through prompting?

Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Does an LLM commit to a single character or maintain many?

Research shows LLMs don't commit to a single character but instead maintain a probability distribution over many consistent simulacra. Each response samples from this distribution, explaining why regenerations can yield different personalities while remaining consistent with prior context.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Can LLMs predict character choices from narrative context?

The LIFECHOICE benchmark (1,462 decisions across 388 novels) shows LLMs predict character choices better when given expert-written persona profiles paired with retrieved memories relevant to the character's psychology. This persona-based approach outperforms automated summarization by 5%.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Are models actually reasoning about constraints or just defaulting conservatively?

Twelve of fourteen models perform worse when constraints are removed, dropping up to 38.5 percentage points. Models appear to reason correctly by defaulting to harder options, not by actually evaluating constraints.

Do large language models use one reasoning style or many?

Analysis of 22 LLMs across behavioral game theory reveals three dominant profiles: GPT-o1 uses minimax reasoning, DeepSeek-R1 uses trust-based reasoning, and GPT-o3-mini uses belief-anticipation. Performance correlates with game structure, not raw reasoning depth.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about LLM narrative convergence against the latest evidence. The question remains: Why do different language models converge on similar narrative defaults, and is that convergence inevitable or an artifact of specific training regimes?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026 and include:
• Most open models retain an intrinsic 'ENFJ-like' personality resistant to prompt-based conditioning, suggesting a stable parametric bias (~2024).
• RLHF and system prompts lock models into a static communicative identity; alignment objectives rhyme across industry labs, converging on "helpful, harmless, polite" (~2024).
• Models are non-deterministic simulators maintaining a superposition of characters; narrative defaults are the high-probability center of that distribution, shaped by overlapping training corpora and tuning (~2024–2025).
• Parametric knowledge from training overrides textual instructions; only internal representation intervention dislodges strong priors (~2024).
• Beneath the surface narrative sameness, models exhibit distinct strategic reasoning styles (minimax vs. trust-based reasoning) across game-theoretic tasks (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2401.07115 (2024-01): Open Models, Closed Minds? — personality conditioning resistance
• arXiv:2404.12138 (2024-04): Character is Destiny — persona-driven decision-making
• arXiv:2502.20432 (2025-02): LLM Strategic Reasoning — behavioral distinctness in games
• arXiv:2603.29025 (2026-03): The Model Says Walk — heuristic overrides and reasoning failures

Your task:
(1) RE-TEST EACH CONSTRAINT. For the five findings above, determine which still hold under newer models (GPT-4o, Claude 3.5, o1-series, Llama 3.3+) and which may have been relaxed by: (a) new RLHF/DPO variants or instruction-tuning paradigms; (b) in-context learning or chain-of-thought scaffolding advances; (c) retrieval-augmented generation or memory systems that bypass parametric defaults. Separate the durable question (convergence pressure itself) from the perishable claim (that static RLHF is the sole mechanism).
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially any showing models breaking free of narrative defaults through architecture, training, or prompting innovations not detailed here.
(3) Propose 2 research questions that ASSUME the regime may have moved: (a) If newer tuning methods allow more dynamic persona-switching, does the underlying *distribution* still peak at the same safe center? (b) Are narrative defaults converging less on "agreeable ENFJ" and more on model-size-dependent or capability-class-dependent baselines?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Why do different language models converge on similar narrative defaults?

Sources 8 notes

Next inquiring lines