Why do LLM persona prompts produce inconsistent outputs across runs?
Can language models reliably simulate different social perspectives through persona prompting, or does their run-to-run variance indicate they lack stable group-specific knowledge? This matters for whether LLMs can approximate human disagreement in annotation tasks.
A persistent challenge in NLI annotation is that human annotators genuinely disagree — not from error, but because the same sentence carries different readings for people with different social positions, ideological backgrounds, or domain expertise. The proposed solution: instruct LLMs to simulate different annotator personas and generate a distribution of labels that reflects human disagreement.
The approach fails for a specific reason: LLM outputs under persona prompting are not stable enough across runs to be meaningful as persona simulations. When the same persona prompt ("respond as a conservative rural voter", "respond as a medical professional") is run multiple times on the same input, the variance in the output distribution across runs is comparable to or larger than the variance across different personas. This means model uncertainty is dominating persona-specific knowledge — the spread in outputs reflects what the model doesn't confidently know, not what different social groups actually think differently.
This is a different diagnosis from simply "LLMs don't know what different groups believe." The more precise claim is: even if the model has relevant group-specific information, it is not stably retrievable under the persona prompt. The persona acts more like a temperature modifier (loosening the output distribution) than a grounding anchor (fixing the output to a specific knowledge domain).
The implication for NLI research methodology is significant: persona-based annotation simulation cannot substitute for actual diverse human annotation panels. The goal was to cheaply approximate human annotation disagreement distributions; the actual output approximates model uncertainty distributions, which have a different shape and origin.
This connects to Why do language models fail confidently in specialized domains? — both findings point to the same underlying gap: LLMs produce confidently-framed outputs even when their underlying representations are uncertain or thin. In overconfidence, the model is wrong and certain; in persona instability, the model is uncertain and generates that uncertainty as if it were persona variance.
The broader implication for Why do readers interpret the same sentence so differently? is that the multiplicity of interpretations is grounded in actual social diversity, not just distributional uncertainty. LLMs can approximate the form of disagreement (varied outputs) but not the substance (stable group-grounded positions). When this instability is applied to evaluation, Why do LLM judges fail at predicting sparse user preferences? identifies persona sparsity as the specific mechanism: run-to-run variance overwhelms persona variance because sparse persona profiles cannot constrain model predictions — the uncertainty documented here is the root cause of personalized judge failure.
Enrichment (2026-02-22, from Arxiv/Personas Personality): Instability is one of three persona failure modes. The "Open Models, Closed Minds" study identifies a complementary failure: resistance — most open LLMs retain their intrinsic ENFJ-like personality despite persona conditioning, failing to shift to the target personality at all. See Can open language models adopt different personalities through prompting?. The third failure mode is cognitive distortion: when persona assignment DOES take hold, it induces motivated reasoning — political personas are up to 90% more likely to validate identity-congruent evidence. See Do personas make language models reason like biased humans?. Together these form a three-way persona failure taxonomy: instability (this note), resistance (closed-minded), and distortion (motivated reasoning).
Inquiring lines that use this note as a source 91
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Do individual persona simulations work?
- How do LLM user simulators fail to represent authentic user behavior distributions?
- Can the same conversation coherently continue across different model versions?
- Can one model instance host multiple realized personas simultaneously?
- What makes sincerity impossible without a coherent first-person perspective?
- How does persona consistency affect coherence in simulated dialogue?
- Can LLM judges reliably estimate when they lack sufficient persona information?
- Why does model uncertainty dominate persona-specific knowledge in annotation tasks?
- How does non-human origin of personas affect team willingness to critique them?
- Why do language models successfully simulate political perspectives and social personas?
- Why do LLM regenerations produce meaningfully different personalities from the same prompt?
- What does the 20-questions test reveal about LLM character consistency?
- What measurement artifacts emerge when annotators interpret the same question differently?
- Do synthetic personas maintain consistency across multiple conversations?
- How do LLM personas compare to demographic targeting?
- Do LLM judges with diverse personas resist individual biases better than single evaluators?
- What makes personas in multi-agent systems actually contribute meaningful domain depth?
- How does sampling variation relate to prompt sensitivity as reliability concerns?
- What does McDonald's omega reveal about LLM judgment consistency?
- How does the dialogue prompt establish the character the model plays?
- Why does batching multiple conversations on one GPU create identity problems?
- What distinguishes character simulation from authentic voice in language model outputs?
- Why do multiple user personas need separate attention rather than one dense vector?
- How does Shanahan's simulator model explain first-person pronoun consistency in dialogue agents?
- What role does prompt context play in preventing genuine addressee modeling in generation?
- How does prompting language shift what LLMs express about political figures?
- Why do short interviews outperform demographic labels for persona simulation?
- Why do most open language models resist personality conditioning via prompts?
- Can persona-based approaches capture genuine disagreement in expert annotations?
- Can persona profiles be enriched to constrain LLM predictions and reduce run-to-run variance?
- Do open-source LLMs show different resistance patterns to persona prompting than closed models?
- How does persona instability in annotation compare to LLM overconfidence in low-resource domains?
- What distinguishes actual social disagreement from distributional uncertainty in LLM outputs?
- What distinguishes personality resistance from persona instability in LLMs?
- Can LLM-as-Judge metrics replace human annotation for detecting persona contradictions?
- Why do role-playing agents show belief-behavior inconsistency in their outputs?
- Does single model persona diversity match true multi-model diversity at scale?
- Why does dynamic persona identification outperform fixed personas in prompting?
- Can persona prompting overcome the default ENFJ personality in language models?
- Why do models resist personality change despite sophisticated prompting techniques?
- How does RLHF fine-tuning conflict with simulating diverse user personas?
- Can offline RL scale persona consistency across multi-turn conversations?
- How does RLHF-induced mode collapse limit diversity in LLM-generated personas?
- How does support coverage relate to systematic biases in persona simulation?
- How do structured clinical models solve persona calibration better than ad hoc generation?
- Why do individual persona simulations succeed when population-level representation fails?
- How do persona vectors compare to other methods for monitoring model behavior drift?
- Why do personas in language models resist correction through prompting alone?
- What makes persona-assigned language models unstable across different conversation runs?
- Can multi-turn conversations manipulate language model reasoning in similar ways to personas?
- Why do language models resist adopting different personalities when prompted?
- What causes different personality traits to trigger different emoji densities in generated text?
- Can persona consistency coexist with relevant dialogue in personalized conversation?
- How does distractor persona selection affect consistency enforcement in dialogue?
- Why is persona consistency a pragmatic property rather than semantic?
- Why do some prompts benefit from aggregation while others do not?
- Which prompt properties determine whether variance helps under majority voting?
- How does quasi-interpretivism differ from simply role-playing character analysis?
- Can quasi-interpretivism apply to entire persona states rather than single beliefs?
- Can users be modeled as multiple personas instead of single vectors?
- How do internal persona patterns drive emergent misalignment across domains?
- Why do language models prefer certain response styles regardless of what the prompt asks?
- Do chain-of-thought prompts help RLVR models predict annotation disagreement?
- Why does extending reasoning traces worsen persona consistency?
- Does villain roleplay failure reveal why LLMs cannot adopt genuine controversial positions?
- Why do models confabulate inconsistently across different samples?
- Can similar profiles amplify systematic biases in persona simulation at scale?
- What makes extended personal narratives more effective than attribute lists for personas?
- Why does static persona definition fail to capture natural variation?
- Why does regenerating LLM responses produce different but equally valid answers?
- Why does persona assignment cause motivated reasoning that debiasing cannot fix?
- Why do LLM persona annotations become unstable when run multiple times?
- How many distinct quasi-persons does a single language model actually support?
- How do persona and context multiply to improve synthetic dialogue diversity?
- Can persona-mixture calibration avoid the need for post-hoc diversity reranking?
- Can persona-based explanation coexist with item-aspect based explanation routes?
- Why does persona assignment make it harder for models to hold values in tension?
- Why do marginal effects fail to replicate in AI persona simulations?
- What systematic biases emerge when scaling persona simulation to population level?
- Can prompted or fine-tuned models generate genuine narrative ambiguity?
- How do annotation artifacts get mistaken for genuine human values?
- Why do LLM persona simulations replicate main effects but fail on marginal effects?
- Why does diversity in LLM outputs mask sampling from community priors?
- Does richer input to LLM personas improve their fidelity to human responses?
- Can persona prompts reliably transfer across different question domains?
- Why do low-knowledge personas reduce LLM accuracy on hard questions?
- How should persona prompts be used if not for accuracy?
- Why does fairness depend on context and who you ask?
- Why do prompt effects reverse between different model generations?
- What other pragmatic prompt features have unstable effects?
- How do persona consistency and contextual relevance trade off in personalized dialogue systems?
Related concepts in this collection 6
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do language models fail confidently in specialized domains?
LLMs perform poorly on clinical and biomedical inference tasks while remaining overconfident in their wrong answers. Do standard benchmarks hide this fragility, and can prompting techniques fix it?
both findings show LLM outputs don't reliably track underlying epistemic state
-
Why do readers interpret the same sentence so differently?
How much of annotation disagreement in NLP reflects genuine interpretive multiplicity rather than error? This explores whether social position and moral framing systematically generate competing but equally valid readings.
human disagreement is socially grounded; persona simulation cannot replicate that grounding
-
Do classical knowledge definitions apply to AI systems?
Classical definitions of knowledge assume truth-correspondence and a human knower. Do these assumptions hold for LLMs and distributed neural knowledge systems, or do they need fundamental revision?
unstable persona outputs are another manifestation of LLMs lacking the social situatedness that grounds stable perspective-taking
-
Can open language models adopt different personalities through prompting?
Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.
complementary failure: resistance vs instability
-
Do personas make language models reason like biased humans?
When LLMs are assigned personas, do they develop the same identity-driven reasoning biases that humans exhibit? And can standard debiasing techniques counteract these effects?
third failure mode: when personas take hold, they introduce cognitive biases
-
Does conditioning LLMs on personal profiles improve prediction?
Persona induction—feeding LLMs participant-specific information—is widely used to make models simulate individuals more accurately. But does it actually work at the individual level where it matters most?
grounds: if model uncertainty swamps persona signal, conditioning cannot improve individual prediction
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
- Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness
- Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
- From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation
- DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications
- Persona Generators: Generating Diverse Synthetic Personas at Scale
- Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust
- Unleashing Cognitive Synergy In Large Language Models: A Task-solving Agent Through Multi-persona Self-collaboration
Original note title
llm persona-simulated annotations are unstable across runs indicating model uncertainty dominates persona-specific knowledge