Why do LLM persona simulations replicate main effects but fail on marginal effects?
This explores why LLM-simulated personas reliably reproduce the strong, headline findings of human experiments (main effects) but stumble on the subtle, conditional ones (marginal effects) — and what that gap reveals about what these simulations are actually doing.
This explores why LLM-simulated personas reliably reproduce the strong, headline findings of human experiments but break down on the subtle, conditional ones. The clearest data point in the corpus is direct: when AI personas re-ran marketing experiments, they reproduced 76% of main effects — and crucially, replication success tracked the *strength of the original evidence* (the p-value) Can AI personas reliably replicate human experiment results?. Main effects are the loud, robust signals. Marginal effects are the quiet ones, and there the same study found both false positives and false negatives. So the first answer is almost mechanical: the model is reproducing effects in proportion to how strongly they're written into the patterns it learned from, and faint effects don't survive that filter.
But the corpus suggests the deeper reason is *where the variance comes from*. When you run the same persona prompt repeatedly, the output swings as much across reruns of one persona as it does across genuinely different personas Why do LLM persona prompts produce inconsistent outputs across runs?. That means the noise floor of the model's own uncertainty is roughly the size of a marginal effect. A main effect is big enough to poke through that floor; a marginal effect is the same magnitude as the noise, so it gets drowned. The model isn't simulating a stable person with subtle conditional preferences — it's sampling from its own uncertainty, and subtle structure is exactly what uncertainty erases.
There's a statistical version of the same story that explains the false positives, not just the misses. Persona generation relies on heuristics that can't recover the *true joint distribution* from marginal data — it knows the population averages but invents the interactions How do we generate realistic personas at population scale?. Marginal effects often *are* interactions (this group responds differently under that condition), so a model that fakes the joint distribution will confidently produce conditional effects that aren't real. And conditioning on a specific profile doesn't rescue you: across 200,000+ participants, feeding LLMs personal profiles gave no measurable gain in predicting individuals Does conditioning LLMs on personal profiles improve prediction?. The lever you'd reach for to capture fine-grained differences turns out to be disconnected.
What makes the failure *systematic* rather than random is that personas don't just blur — they bend in a direction. Assigning an identity induces motivated reasoning: models become ~90% more likely to accept evidence congruent with their assigned identity, and standard debiasing prompts don't fix it because the bias sits below the instruction layer Do personas make language models reason like biased humans?. Combine identity-congruent bias with models that stubbornly resist personality conditioning in the first place Can open language models adopt different personalities through prompting?, and you get a tilt that distorts precisely the small, conditional effects while leaving the big effects standing. The summary note ties these threads together as three named failure modes — instability, conditioning resistance, and identity-congruent bias How accurately can language models simulate human personalities?.
The thing worth taking away: the main-effect successes and the marginal-effect failures aren't two separate facts — they're the same fact seen from two sides. These systems are good at reproducing the average and bad at reproducing the *structure around* the average, because they capture marginal distributions but fabricate the joints, and their internal noise is sized to wash out anything subtle. If you want to push on whether that's a fixable calibration problem or a deeper limit, the calibration-science argument How do we generate realistic personas at population scale? and the realizationist claim that personas are genuine installed dispositions Are LLM personas realized or merely simulated through training? make an interesting pair to read against each other.
Sources 8 notes
Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.
When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.
LLM persona generation produces systematic biases in downstream tasks like election forecasting because it relies on heuristic techniques that cannot recover true joint distributions from marginal data. Solving this requires benchmarks, training datasets, and structured frameworks analogous to ImageNet.
Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.
Assigning personas to LLMs induces identity-congruent evaluation bias, with models 90% more likely to accept evidence matching their assigned identity. Standard prompt-based debiasing fails to mitigate this effect, suggesting the bias operates below the level of instruction.
Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.
LLMs replicate human responses at 85% fidelity in interviews and 76% of experimental effects in marketing studies. However, this accuracy masks three failure modes: run-to-run instability, resistance to personality conditioning, and identity-congruent cognitive biases that distort simulated reasoning.
Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.