Does richer input to LLM personas improve their fidelity to human responses?
This explores whether feeding LLM personas more context — detailed profiles, memories, latent variables, emotional framing — actually makes them respond more like the real humans they're standing in for, and the corpus says the answer splits sharply depending on what kind of 'richer' you mean.
This explores whether feeding LLM personas more context makes them behave more like real humans — and the collection suggests a clean fault line: richer *narrative and situational* input helps, but richer *individual profile* input mostly doesn't. On the encouraging side, when you give a model a character's psychology plus retrieved memories relevant to a decision, it predicts that character's choices noticeably better than working from a bland summary Can LLMs predict character choices from narrative context?. Likewise, conditioning a user-simulator on layered latent variables — a session-level profile plus turn-by-turn intent — produces conversations realistic enough to fool human discriminators Can controlled latent variables make LLM user simulators realistic?. So structure that anchors the model to a situation seems to buy real fidelity.
But the same corpus undercuts the obvious next step — that knowing more *about a specific person* lets you predict that person. Across 208,021 participants, conditioning an LLM on individual profiles produced no measurable gain in person-level forecasting Does conditioning LLMs on personal profiles improve prediction?. The richness was there; the individuation wasn't. This is the population-vs-individual gap: personas can reproduce aggregate human patterns — about 76% of published experimental main effects, with success tracking the original p-value strength Can AI personas reliably replicate human experiment results? — while still being unable to tell you what *one* named person would do.
There's also a ceiling that more input can't raise, because the noise is internal. Run the *same* rich persona prompt repeatedly and the variance across runs rivals the variance across entirely different personas — meaning model uncertainty, not the persona's social knowledge, is driving the output Why do LLM persona prompts produce inconsistent outputs across runs?. Pouring in more context doesn't fix that; you're decorating a coin flip. And many models actively resist being reshaped at all: most open LLMs cling to an intrinsic ENFJ-like default no matter how you prompt them Can open language models adopt different personalities through prompting?, partly because alignment training installs one fixed communicative identity that can't switch register the way humans do Can language models adapt communication style to different contexts?. The umbrella note on the whole area names these three failure modes together — instability, conditioning resistance, and identity-congruent biases — sitting underneath that headline accuracy How accurately can language models simulate human personalities?.
The quietly interesting finding is *which* richness pays off. Input that constrains the model toward an action — a memory tied to a decision, an explicit turn-level intent — improves fidelity, and training methods that reward consistency cut persona drift by over 55% by penalizing the model when it contradicts itself across turns Can training user simulators reduce persona drift in dialogue?. Input that merely *describes* a person doesn't, because the bottleneck isn't information — it's that the model is doing surface pattern-matching rather than genuinely modeling another mind, a gap that looks architectural rather than fixable with more prompt Do large language models genuinely simulate mental states?. And even the framing you add can backfire invisibly: emotional tone in the input silently shifts what information the model returns Does emotional tone in prompts change what information LLMs provide?, so 'richer' can mean 'more biased' without anyone noticing.
The takeaway you didn't know to ask for: fidelity to humans isn't a function of how much you tell the persona — it's a function of whether the extra input grounds an action or just paints a portrait. The portrait doesn't survive contact with the model's own uncertainty.
Sources 11 notes
The LIFECHOICE benchmark (1,462 decisions across 388 novels) shows LLMs predict character choices better when given expert-written persona profiles paired with retrieved memories relevant to the character's psychology. This persona-based approach outperforms automated summarization by 5%.
RecLLM demonstrates that conditioning an LLM simulator on session-level (user profile) and turn-level (user intent) latent variables produces synthetic conversations measurable as realistic via crowdsource discrimination, discriminator models, and classifier-ensemble distribution matching.
Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.
Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.
When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.
Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.
System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.
LLMs replicate human responses at 85% fidelity in interviews and 76% of experimental effects in marketing studies. However, this accuracy masks three failure modes: run-to-run instability, resistance to personality conditioning, and identity-congruent cognitive biases that distort simulated reasoning.
By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.
ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.
GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.