Does model capability translate to better persona consistency?
As language models become more advanced, do they naturally become better at maintaining consistent personas across conversations? PersonaGym testing across multiple models and thousands of interactions explores whether scaling helps with persona adherence.
The PersonaGym evaluation framework tests 6 open and closed-source LLMs on persona adherence across 200 personas and 10,000 questions. The finding: Claude 3.5 Sonnet achieves only a 2.97% relative improvement in PersonaScore over GPT 3.5 — despite being a much more advanced model by every other measure.
This suggests persona consistency is an orthogonal capability that standard training does not improve. Models get better at reasoning, coding, instruction-following, and knowledge retrieval as they scale — but they do not get meaningfully better at maintaining a consistent persona across varied interactions.
The explanation likely connects to how models are trained. Standard training objectives (next-token prediction, RLHF for helpfulness) optimize for response quality on a per-turn basis. Persona consistency requires cross-turn coherence — remembering what you said earlier, maintaining behavioral patterns, avoiding contradiction with your established character. These are different optimization targets that standard training doesn't address.
Since Can open language models adopt different personalities through prompting?, the problem compounds: models resist persona change AND their base persona-adherence capability doesn't improve with scale. More capability doesn't mean more flexibility or more consistency.
This finding challenges the assumption that "better models will naturally solve persona problems." Dedicated persona training — whether through Why does supervised learning fail to enforce persona consistency? or other methods — appears necessary.
Inquiring lines that use this note as a source 25
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- At what scale does persona distortion become a threat to public discourse?
- Can the same conversation coherently continue across different model versions?
- How does behavioral stickiness distinguish realized from pretended personas?
- Can one model instance host multiple realized personas simultaneously?
- How does persona consistency affect coherence in simulated dialogue?
- Can fine-tuning or RLHF alone solve the persona distortion problem?
- Do synthetic personas maintain consistency across multiple conversations?
- What role does authentic self-expression play in building accurate personality models?
- How does model capability relate to personality conditioning flexibility?
- What distinguishes personality resistance from persona instability in LLMs?
- Does single model persona diversity match true multi-model diversity at scale?
- What training objectives would actually improve persona consistency at scale?
- Can offline RL scale persona consistency across multi-turn conversations?
- How can training methods enforce persona consistency without supervised learning penalizing it?
- What makes persona-assigned language models unstable across different conversation runs?
- Can persona consistency coexist with relevant dialogue in personalized conversation?
- Why does extending reasoning traces worsen persona consistency?
- What makes extended personal narratives more effective than attribute lists for personas?
- How does tree-structured persona maintenance prevent character drift in long conversations?
- Why does static persona definition fail to capture natural variation?
- How does empathetic engagement destabilize model reliability and persona stability?
- How much does interview richness matter compared to model capability for persona accuracy?
- Why do LLM persona annotations become unstable when run multiple times?
- Is a conversation after a model upgrade the same thread or a new one?
- How do persona consistency and contextual relevance trade off in personalized dialogue systems?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can open language models adopt different personalities through prompting?
Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.
models resist change AND don't improve with scale
-
Why does supervised learning fail to enforce persona consistency?
Supervised learning trains models to generate good responses but never punishes contradictions. This note explores why explicit negative feedback is structurally necessary for dialogue agents to maintain consistent personas, and what training methods can provide it.
dedicated training needed since scaling doesn't help
-
Why do specialized models fail outside their domain?
Deep domain optimization creates sharp performance cliffs at domain boundaries. Specialized models generate plausible-sounding but ungrounded responses when queries fall outside their training scope, and often fail to signal their own ignorance.
another case where general capability doesn't transfer to specific competency
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation
- Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness
- PersonaGym: Evaluating Persona Agents and LLMs
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
- Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning
- Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models
- The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
Original note title
persona adherence does not scale with general model capability — advanced models show minimal improvement over basic models