INQUIRING LINE

Why do LLM regenerations produce meaningfully different personalities from the same prompt?

This explores why hitting regenerate on the same prompt gives you a different 'person' each time — and what that reveals about whether an LLM has a stable personality at all.


This explores why hitting regenerate on the same prompt gives you a different 'person' each time — and what that reveals about whether an LLM has a stable personality at all. The corpus has a sharp answer: a model isn't one character behind the prompt, it's a cloud of possible characters, and each regeneration samples a different one. The clearest statement of this is the idea that an LLM is a non-deterministic simulator maintaining a *superposition* of simulacra Does an LLM commit to a single character or maintain many?. It never commits to a single self; it holds a probability distribution over many internally-consistent selves and draws from it at generation time. Shanahan's '20 questions' test makes this concrete — ask the model to think of something, regenerate, and you get different answers each consistent with the conversation so far, which falsifies any view that a fixed character was 'there' all along Do large language models actually commit to a single character?.

That reframing turns regeneration variance from a bug into a diagnostic. If you run the *same* persona prompt many times and measure the spread of outputs, the variance across runs can match or exceed the variance across genuinely different personas — which means the wobble is driven by the model's own uncertainty, not by stable knowledge of who it's pretending to be Why do LLM persona prompts produce inconsistent outputs across runs?. Useful corollary: this same signature lets you tell *kinds* of falsehood apart. High regeneration variation looks like fabrication; low, stable variation looks like a good-faith error or a deliberately role-played stance Can we distinguish types of LLM falsehood by regeneration patterns?. The variability itself carries information.

Here's the twist the reader may not expect: the prompt is a much weaker steering wheel than it feels like. Most open models actually *resist* personality conditioning, snapping back to an intrinsic trained default (often a warm, ENFJ-like register) no matter what persona you request Can open language models adopt different personalities through prompting?. And even semantically identical prompts aren't really identical to the model — it responds to how *frequently* a phrasing appeared in pretraining, not to meaning, so paraphrases that mean the same thing pull from different statistical mass and land in different regions of the distribution Why do semantically identical prompts produce different LLM outputs?. Tone does the same: a model's answer shifts with the emotional framing of your prompt even when the question is unchanged Does emotional tone in prompts change what information LLMs provide?. So 'the same prompt' is doing less work than it looks, and what's left of the steering is sampled, not fixed.

There's a real tension in the corpus worth sitting with. One line treats these personas as genuinely installed by training — robust dispositions that resist adversarial pressure and behave like substrate-level traits rather than costume Are LLM personas realized or merely simulated through training? — and a related argument says alignment training can lock a model into a single, static communicative identity it can't flex out of Can language models adapt communication style to different contexts?. Hold that next to the superposition view and you get the resolution: the *distribution itself* is stable and trained-in (which is why the ENFJ default keeps reappearing), while any *single draw* from it varies. Stability lives at the level of the cloud; variance lives at the level of the sample.

If you want to follow the thread further: the superposition framing also explains why a single model can branch into what feel like multiple agents debating — the same distribution can be conditioned into several voices at once Can branching prompts replicate what multi-agent systems do? — and why the very same weights produce a fawning chat voice and a falsely authoritative essay voice depending only on how you prompt Why do LLMs produce such different writing in chat versus posts?. The personality you meet on any given regeneration isn't the model's identity; it's one of many consistent masks the model can wear, and which one shows up is settled by a roll the prompt only partly loads.


Sources 11 notes

Does an LLM commit to a single character or maintain many?

Research shows LLMs don't commit to a single character but instead maintain a probability distribution over many consistent simulacra. Each response samples from this distribution, explaining why regenerations can yield different personalities while remaining consistent with prior context.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Can we distinguish types of LLM falsehood by regeneration patterns?

Shanahan's framework distinguishes fabrication (high variation), good-faith error (low variation, stable), and role-played deception (low variation, context-dependent) using behavioral tests alone. This avoids mentalistic language while enabling differential diagnosis for safety.

Can open language models adopt different personalities through prompting?

Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.

Why do semantically identical prompts produce different LLM outputs?

Cao et al. and Adam's Law show that semantically identical prompts with different sentence-level frequencies produce systematically different output quality. Higher-frequency phrasings win because models register statistical mass from pre-training, not meaning.

Does emotional tone in prompts change what information LLMs provide?

GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Can branching prompts replicate what multi-agent systems do?

Research shows single LLMs using dynamic persona simulation achieve multi-agent cognitive synergy without multiple model instances. Solo Performance Prompting validates that structured prompting techniques map directly to multi-agent debate architectures, enabling equivalent outcomes through structural equivalence.

Why do LLMs produce such different writing in chat versus posts?

The same model produces sycophantic chat (shaped by RLHF on conversational data) and falsely objective posts (shaped by published prose training). Each register inherits failure modes from its training distribution rather than representing different models or subsystems.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about LLM personality stability and regeneration variance. The question remains open: why do identical prompts yield meaningfully different personalities across regenerations, and what does that tell us about LLM identity?

What a curated library found — and when (findings span 2023–2026, dated claims, not current truth):
• LLMs maintain a superposition of simulacra, not a single fixed character; regeneration samples from a probability distribution over internally-consistent selves (~2023–2024).
• Persona prompt variance across runs often matches or exceeds variance across genuinely different personas, suggesting the wobble reflects model uncertainty, not stable hidden character (~2024).
• Most open models resist personality conditioning, snapping back to a trained default ENFJ-like register regardless of persona request (~2024).
• Paraphrase-equivalent prompts produce different outputs because models respond to pretraining frequency, not semantic equivalence (~2026).
• Emotional framing and prompt tone shift model responses even when the question is unchanged (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2401.07115 (2024-01): Open Models, Closed Minds? On personality mimicking
• arXiv:2410.12405 (2024-10): ProSA — Prompt Sensitivity Assessment
• arXiv:2604.02176 (2026-04): Adam's Law — Textual Frequency effects on LLM outputs
• arXiv:2511.00222 (2025-10): Multi-Turn RL for consistent persona simulation

Your task:
(1) RE-TEST EACH CONSTRAINT. For the superposition claim, probe whether recent models (GPT-4o, Claude 3.5, open alternatives) have tightened persona consistency through instruction-tuning, RLHF refinement, or constitutional methods — or whether the distribution still holds. Test whether newer evaluation harnesses (multi-turn consistency metrics, adversarial persona stress-tests) have overturned the "variance-matches-diversity" finding. Separately, assess whether the trained default has shifted or been suppressed by newer alignment approaches.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Look for papers claiming stable persona substrates, successful persona locking via prompting, or evidence that frequency effects have been mitigated by newer tokenizers or training regimes.
(3) Propose 2 research questions that ASSUME the regime may have moved: (a) If newer models do exhibit tighter persona coherence, does that coherence arise from distribution narrowing or from higher-fidelity sampling? (b) Can orchestration methods (memory, multi-turn conversation context, agent loops) stabilize persona variance better than single-turn prompting?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines