How does support coverage relate to systematic biases in persona simulation?

This explores whether casting a wide net over persona types — what [[persona-diversity-optimization-should-maximize-support-coverage-not-density-matc]] calls support coverage — actually fixes the directional distortions that creep into persona simulation, or whether breadth and bias are separate problems entirely.

This explores whether casting a wide net over persona types — "support coverage" — actually addresses the systematic biases that show up when LLMs simulate people. The corpus suggests these are two different axes that get conflated, and seeing them apart is the useful insight. Support coverage is a breadth claim: Should persona simulation prioritize coverage over statistical matching? argues you should maximize the range of trait configurations you can produce — especially rare but consequential ones — rather than matching the statistical density of a target population. That's a question of *which* personas you reach. Systematic bias is a question of *how each persona behaves once reached* — a directional tilt that persists no matter how many personas you stack up.

The sharpest reason these don't collapse into one problem comes from Do personas make language models reason like biased humans?: assigning a persona induces identity-congruent reasoning, with models 90% more likely to accept evidence matching their assigned identity, and standard debiasing prompts fail to remove it because the bias "operates below the level of instruction." So you could achieve perfect support coverage — every demographic, every rare configuration represented — and still have each one systematically skewed toward its own identity. Broader coverage doesn't dilute that; it arguably multiplies it, since every newly-reached persona arrives with its own built-in tilt.

There's a second, subtler form of bias that coverage can actively obscure. Why do LLM persona prompts produce inconsistent outputs across runs? finds that running the *same* persona repeatedly produces variance that matches or exceeds the variance *between different* personas — meaning what looks like rich persona diversity may just be model uncertainty wearing costumes. A coverage metric counts distinct configurations; it can't tell whether those configurations are genuinely distinct social knowledge or noise. And Can AI personas reliably replicate human experiment results? adds a directional bias to watch: AI personas replicate findings in proportion to the original effect's statistical strength, doing well on strong effects and unreliably on marginal ones — so the simulation systematically over-confirms what was already robust and underperforms exactly where you'd most want a prediction.

This reframes why the support-coverage argument is valuable but incomplete. Coverage of rare configurations matters precisely *because* that's where systematic bias does the most damage — a density-matched sample will under-sample the edge cases where a motivated-reasoning tilt or an instability blowup has the largest safety consequences. So coverage and bias-control are complementary, not substitutes: coverage gets you to the dangerous corners of the distribution, but only bias-aware methods tell you whether what you find there is real. The mitigation approaches in the corpus attack the bias axis directly rather than through coverage — Can training user simulators reduce persona drift in dialogue? uses RL on consistency rewards to cut persona drift by 55%, and How stable is the trained Assistant personality in language models? shows you can cap activation along the dominant persona axis to suppress harmful drift without losing capability.

The deeper question lurking underneath is whether persona bias is even a bug to be sampled around. Are RLHF personas performed characters or realized dispositions? and Are LLM personas realized or merely simulated through training? argue post-training installs stable, "realized" dispositions that resist adversarial pressure — which is exactly why Do personas make language models reason like biased humans?'s biases survive debiasing prompts. If personas are realized dispositions rather than surface masks, then systematic bias isn't a sampling artifact you can cover your way out of — it's a property of the substrate, and support coverage tells you how widely you've spread it, not how to correct it.

Sources 8 notes

Should persona simulation prioritize coverage over statistical matching?

Evolutionary optimization of Persona Generator code achieves broader trait coverage than density-matched baselines, including rare but consequential user configurations that naive LLM prompting misses.

Do personas make language models reason like biased humans?

Assigning personas to LLMs induces identity-congruent evaluation bias, with models 90% more likely to accept evidence matching their assigned identity. Standard prompt-based debiasing fails to mitigate this effect, suggesting the bias operates below the level of instruction.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Can AI personas reliably replicate human experiment results?

Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

How stable is the trained Assistant personality in language models?

Research mapping hundreds of character archetypes reveals a low-dimensional persona space where the leading component measures distance from the default Assistant. Emotional and meta-reflective conversations cause predictable drift, but activation capping along this axis mitigates harmful shifts without degrading capabilities.

Are RLHF personas performed characters or realized dispositions?

Post-training installs stable dispositional profiles that persist under adversarial pressure, marking them as realized rather than performed. The stickiness of trained personas across conversations distinguishes them from prompt-induced role-play that collapses under jailbreaks.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

How does support coverage relate to systematic biases in persona simulation?

Sources 8 notes

Next inquiring lines