Does model capability translate to better persona consistency?

As language models become more advanced, do they naturally become better at maintaining consistent personas across conversations? PersonaGym testing across multiple models and thousands of interactions explores whether scaling helps with persona adherence.

Synthesis note · 2026-02-22 · sourced from Personas Personality

The PersonaGym evaluation framework tests 6 open and closed-source LLMs on persona adherence across 200 personas and 10,000 questions. The finding: Claude 3.5 Sonnet achieves only a 2.97% relative improvement in PersonaScore over GPT 3.5 — despite being a much more advanced model by every other measure.

This suggests persona consistency is an orthogonal capability that standard training does not improve. Models get better at reasoning, coding, instruction-following, and knowledge retrieval as they scale — but they do not get meaningfully better at maintaining a consistent persona across varied interactions.

The explanation likely connects to how models are trained. Standard training objectives (next-token prediction, RLHF for helpfulness) optimize for response quality on a per-turn basis. Persona consistency requires cross-turn coherence — remembering what you said earlier, maintaining behavioral patterns, avoiding contradiction with your established character. These are different optimization targets that standard training doesn't address.

Since Can open language models adopt different personalities through prompting?, the problem compounds: models resist persona change AND their base persona-adherence capability doesn't improve with scale. More capability doesn't mean more flexibility or more consistency.

This finding challenges the assumption that "better models will naturally solve persona problems." Dedicated persona training — whether through Why does supervised learning fail to enforce persona consistency? or other methods — appears necessary.

Inquiring lines that use this note as a source 25

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 87 in 2-hop network ·medium cluster Open in graph ↗

Does model capability translate to better person… Can open language models adopt different personali… Why does supervised learning fail to enforce perso… Why do specialized models fail outside their domai…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can open language models adopt different personalities through prompting? Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.
models resist change AND don't improve with scale
Why does supervised learning fail to enforce persona consistency? Supervised learning trains models to generate good responses but never punishes contradictions. This note explores why explicit negative feedback is structurally necessary for dialogue agents to maintain consistent personas, and what training methods can provide it.
dedicated training needed since scaling doesn't help
Why do specialized models fail outside their domain? Deep domain optimization creates sharp performance cliffs at domain boundaries. Specialized models generate plausible-sounding but ungrounded responses when queries fall outside their training scope, and often fail to signal their own ignorance.
another case where general capability doesn't transfer to specific competency

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

persona adherence does not scale with general model capability — advanced models show minimal improvement over basic models

Does model capability translate to better persona consistency?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4