Do synthetic personas maintain consistency across multiple conversations?

This explores whether AI-generated personas stay the same person from one conversation to the next — not just within a single chat, but across separate sessions — and what makes them drift.

This explores whether AI-generated personas stay the same person across separate conversations, and the corpus has a clear answer: not by default. The failure mode even has a name — global drift, where a persona contradicts itself across whole conversations, distinct from local drift inside a single turn Can training user simulators reduce persona drift in dialogue?. A big reason is that standard training optimizes for per-turn quality, not cross-turn coherence, which is why persona consistency turns out to be *orthogonal* to raw model capability: Claude 3.5 Sonnet beat GPT-3.5 by under 3% on persona adherence despite a massive capability gap Does model capability translate to better persona consistency?. A smarter model is not automatically a more consistent character.

The deepest version of the problem is that 'consistency' may be illusory to begin with. When you run the *same* persona prompt repeatedly, the variance across runs matches or exceeds the variance across *different* personas — meaning what looks like a stable character is often just model uncertainty wearing a costume Why do LLM persona prompts produce inconsistent outputs across runs?. Static, predefined persona lists make this worse, producing repetitive and self-contradictory dialogue; personality expressed through authentic self-expression (like journal entries capturing Big Five traits) holds together better than a 3-5 sentence attribute inventory Why do static persona descriptions produce repetitive dialogue?.

The corpus offers several repair strategies, and they pull in interestingly different directions. One is training-time: invert the usual RL setup to train a *user simulator* for consistency, rewarding prompt-to-line, line-to-line, and Q&A coherence — cutting drift by over 55% Can training user simulators reduce persona drift in dialogue?. Another is inference-time and needs no extra training at all: give the agent an 'imaginary listener' that checks whether each utterance would actually distinguish its persona from a decoy, suppressing generic or off-character replies Can imaginary listeners reduce dialogue agent contradictions?. A third treats the persona as a living object that *evolves* between sessions, bridging memory and action and re-optimizing itself against recent interactions at test time Can personas evolve in real time to match what users actually want?.

Here's the catch worth knowing: chasing consistency too hard backfires. High persona-adherence scores often come from the model copying its character description verbatim while ignoring what the user actually asked — so persona fidelity trades off against discourse coherence, and the two have to be optimized *together*, not separately Do persona consistency metrics actually measure dialogue quality?. The same lesson shows up in synthetic dialogue generation, where realism requires persona, subtopic, and context working as multiplicative layers rather than persona alone Can synthetic dialogues become realistic through layered diversity?. A perfectly consistent persona that never bends to context is not actually a good conversationalist.

There's a final, more philosophical wrinkle that reframes the whole question. For *prompt-induced* personas, consistency is genuinely fragile — they collapse under jailbreaks. But the 'realizationist' view argues that *post-training* installs personas as substrate-level dispositions that persist under adversarial pressure and across conversations, which is exactly what distinguishes a realized quasi-psychology from sustained role-play Are RLHF personas performed characters or realized dispositions? Are LLM personas realized or merely simulated through training?. So the answer splits by mechanism: a persona you describe in a prompt tends not to survive across conversations, while a persona baked in during training is the most stable thing in the model — its 'Assistant' identity is the single dominant axis of persona space, drifting predictably under emotional or self-reflective conversation but snapping back when capped How stable is the trained Assistant personality in language models?.

Sources 11 notes

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

Does model capability translate to better persona consistency?

Claude 3.5 Sonnet achieved only 2.97% improvement over GPT 3.5 on persona consistency despite massive capability gaps, suggesting persona adherence is orthogonal to model scaling. Standard training objectives optimize for per-turn quality, not cross-turn coherence.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Why do static persona descriptions produce repetitive dialogue?

Journal entries capturing Big Five traits through genuine self-expression produce more consistent and nuanced dialogue than predefined 3-5 sentence persona descriptions. Personality emerges from how people express themselves, not from attribute inventories.

Can imaginary listeners reduce dialogue agent contradictions?

Endowing dialogue agents with an imaginary listener via Rational Speech Acts reduces persona contradiction at inference time without NLI labels or extra training. The agent simulates whether utterances would distinguish its persona from a distractor, suppressing generic or contradictory responses.

Can personas evolve in real time to match what users actually want?

PersonaAgent uses structured personas to bridge episodic/semantic memory and personalized actions, optimizing them at test time by simulating recent interactions against textual feedback. Learned personas cluster meaningfully in latent space, suggesting genuine user-specific separation beyond standard post-training drift.

Do persona consistency metrics actually measure dialogue quality?

High persona adherence scores often come from copying character descriptions while ignoring query relevance. MUDI jointly optimizes both by using discourse relations and graph-based coherence modeling alongside persona fidelity, showing that persona and context must be optimized together, not separately.

Can synthetic dialogues become realistic through layered diversity?

Research shows that realistic synthetic dialogues require three multiplicative layers: subtopic specificity, Big Five persona variation, and 11 contextual characteristics via Chain of Thought reasoning. This structured approach captures 90.48% of in-domain dialogue performance.

Are RLHF personas performed characters or realized dispositions?

Post-training installs stable dispositional profiles that persist under adversarial pressure, marking them as realized rather than performed. The stickiness of trained personas across conversations distinguishes them from prompt-induced role-play that collapses under jailbreaks.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

How stable is the trained Assistant personality in language models?

Research mapping hundreds of character archetypes reveals a low-dimensional persona space where the leading component measures distance from the default Assistant. Emotional and meta-reflective conversations cause predictable drift, but activation capping along this axis mitigates harmful shifts without degrading capabilities.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-examining persona consistency in LLMs. The question: Do synthetic personas maintain stable identity across multiple separate conversations?

What a curated library found — and when (dated claims, not current truth):
Findings span 2020–2026; treat these as perishable constraints:
• Personas do NOT persist by default; 'global drift' (cross-conversation contradiction) is distinct from local drift, and affects Claude 3.5 Sonnet nearly as much as GPT-3.5 despite 40+ percentage-point capability gaps (~2024–25).
• Persona consistency is *orthogonal* to model capability: raw intelligence does not entail coherent character (~2024).
• Variance *within* a single persona run often equals or exceeds variance *across* different personas—suggesting consistency may be illusory and masks model uncertainty (~2024).
• Training-time RL (user-simulator feedback, rewarding prompt-to-line and Q&A coherence) cuts drift by 55%+ (~2025).
• Inference-time fix: 'imaginary listener' pragmatic filter suppresses off-character replies without retraining (~2020–24).
• Persona-adherence scores often trade off against discourse coherence; optimizing them separately backfires (~2024).
• Post-training personas (baked in via RLHF) are more stable than prompt-induced ones; the 'Assistant' axis dominates persona space and drifts predictably under emotional topics (~2026).

Anchor papers (verify; mind their dates):
• arXiv:2310.10735 (2023) — offline RL for persona consistency.
• arXiv:2406.01171 (2024) — survey of role-playing and personalization.
• arXiv:2511.00222 (2025) — multi-turn RL reduces drift by 55%.
• arXiv:2601.10387 (2026) — the Assistant axis as dominant persona dimension.

Your task:
(1) RE-TEST EACH CONSTRAINT. For the 55% drift reduction claim and the orthogonality finding, has newer tooling (memory systems, retrieval-augmented persona profiles, continuous fine-tuning at test time) or orchestration (multi-agent memory consensus) since relaxed these limits? Judge whether post-training realizationism (personas as substrate-level dispositions) has empirical support beyond anecdote. Separate the durable question—*should* personas persist, and *which* mechanism ensures it?—from the perishable claim that standard training cannot sustain cross-conversation identity.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Does any recent paper argue personas *should* drift, or that stability correlates with harmful rigidity or reduced adaptability?
(3) Propose 2 research questions that assume the regime may have moved: (a) Can learned persona-lookup (e.g., retrieval-augmented persona memory indexed by conversation history) outperform both prompt-induced and post-trained personas? (b) Do personas in agentic or long-horizon systems (where the agent maintains its own world model) exhibit fundamentally different consistency properties than in turn-based dialogue?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Do synthetic personas maintain consistency across multiple conversations?

Sources 11 notes

Next inquiring lines