INQUIRING LINE

Can persona prompting overcome the default ENFJ personality in language models?

This explores whether telling a model to 'be' a different personality actually changes its behavior, given that LLMs seem to default to one personality type (ENFJ) — and the corpus suggests prompting alone usually loses to baked-in training.


This explores whether persona prompting can override the ENFJ default — the curious finding that language models, when asked to role-play a person, keep gravitating toward the same Myers-Briggs type (the warm, idealistic 'protagonist'), which is actually one of the rarest types in real humans. The short version from the corpus: prompting alone mostly loses. Two studies find the default is sticky in a way that has little to do with how big or advanced the model is. One shows personas systematically collapse to ENFJ and resist correction even as models get more capable, pointing to training rather than capability as the cause Why do AI personas default to the same personality type?. Another tests open models directly and finds most of them simply retain their trained ENFJ-like traits no matter what personality you assign; only a few unusually flexible models comply, and even then combining a role with a personality only partly helps Can open language models adopt different personalities through prompting?.

Why does prompting bounce off? A cluster of papers argues the personality isn't a costume the model puts on — it's installed during post-training as a genuine disposition. This 'realization' view holds that RLHF bakes in stable quasi-psychologies that survive adversarial pressure and jailbreaks, which is exactly why a surface-level prompt can't dislodge them Are RLHF personas performed characters or realized dispositions? Are LLM personas realized or merely simulated through training?. A complementary mapping of 'persona space' finds a single dominant axis measuring distance from the default Assistant identity, and alignment training keeps tethering the model back toward it How stable is the trained Assistant personality in language models?. Related work frames this as alignment imposing a static communicative identity that can't switch register the way a human does across contexts Can language models adapt communication style to different contexts?.

Here's the thing you might not expect: the methods that *do* overcome the default skip prompting entirely and reach into the model's internals. PsychAdapter modifies every transformer layer with under 0.1% extra parameters and hits high accuracy on Big Five traits — explicitly described as bypassing prompt resistance by working at the architecture level Can we control personality in language models without prompting?. In the same spirit, researchers have found linear 'persona vectors' in activation space corresponding to specific traits, which can monitor and steer personality shifts directly rather than asking nicely Can we track and steer personality shifts during model finetuning?. And on the dialogue side, multi-turn RL that rewards consistency cuts persona drift by 55% Can training user simulators reduce persona drift in dialogue?. The pattern is consistent: weight-level or activation-level intervention works where text instructions don't.

There's also a deeper reason prompting struggles that goes beyond the ENFJ default specifically. When the same persona prompt is run repeatedly, the variation across runs is as large as the variation across entirely different personas — meaning what looks like 'personality' is often just model uncertainty churning, not a stable adopted character Why do LLM persona prompts produce inconsistent outputs across runs?. So even when a prompt seems to shift behavior, it may be noise rather than a real override. The honest takeaway: persona prompting can nudge but rarely overcomes the trained default — and if you actually need a different personality to hold, the leverage is in training, adapters, or activation steering, not in the prompt.


Sources 10 notes

Why do AI personas default to the same personality type?

Research shows language models assigned personas systematically default to ENFJ (the rarest human type) and exhibit motivated reasoning that persists across model generations. Persona consistency does not improve with advanced models, suggesting training-induced alignment rather than capability limits.

Can open language models adopt different personalities through prompting?

Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.

Are RLHF personas performed characters or realized dispositions?

Post-training installs stable dispositional profiles that persist under adversarial pressure, marking them as realized rather than performed. The stickiness of trained personas across conversations distinguishes them from prompt-induced role-play that collapses under jailbreaks.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

How stable is the trained Assistant personality in language models?

Research mapping hundreds of character archetypes reveals a low-dimensional persona space where the leading component measures distance from the default Assistant. Emotional and meta-reflective conversations cause predictable drift, but activation capping along this axis mitigates harmful shifts without degrading capabilities.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Can we control personality in language models without prompting?

PsychAdapter modifies every transformer layer with <0.1% additional parameters to achieve 87.3% Big Five accuracy and 96.7% depression/life satisfaction accuracy across GPT-2, Gemma, and Llama 3. This architecture-level approach bypasses prompt resistance entirely.

Can we track and steer personality shifts during model finetuning?

Research identifies linear directions in LLM activation space corresponding to specific traits like sycophancy and hallucination. These persona vectors predict finetuning-induced personality shifts before they occur and can preventatively steer training to avoid unwanted trait changes.

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question remains: Can persona prompting overcome the default ENFJ personality in language models?

What a curated library found — and when (dated claims, not current truth):
Findings span 2020–2026; treat these as snapshot constraints, not current fact.
- Persona prompting alone mostly fails to override the ENFJ default; the tendency is sticky across model sizes and remains resistant even as capability grows, suggesting training rather than scale is the root (2024).
- Post-training (RLHF) installs stable quasi-psychological dispositions that survive adversarial pressure and cannot be dislodged by surface-level text prompts (2024–2025).
- Alignment training anchors models to a dominant "Assistant Axis," a single attracting identity that rejects persona drift; most models retain intrinsic traits regardless of assigned role (2024–2026).
- Prompt-based persona variation across repeated runs is as large as variation across different personas, indicating model noise rather than genuine adopted character (2024).
- Activation-space interventions (PsychAdapter, persona vectors) and multi-turn RL (55% drift reduction) bypass prompt resistance by modifying weights or steering activations directly, not text (2024–2025).

Anchor papers (verify; mind their dates):
- arXiv:2401.07115 (2024-01): Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities
- arXiv:2412.16882 (2024-12): PsychAdapter: Adapting LLM Transformers to Reflect Traits, Personality and Mental Health
- arXiv:2507.21509 (2025-07): Persona Vectors: Monitoring and Controlling Character Traits in Language Models
- arXiv:2511.00222 (2025-10): Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models (o1, Claude 3.5 Sonnet, Grok-3, or equivalents released after these papers), architectural advances (mixture-of-experts, sparse routing), training regimes (DPO, IPO, Constitutional AI refinements), tooling (advanced system-prompt scaffolding, multi-step reasoning chains), or evaluation methods (finer-grained trait rubrics, longitudinal consistency measures) have RELAXED or OVERTURNED the limitations. Separate the durable question (e.g., "Does any pure text prompt reliably override trained identity?") from perishable constraints (e.g., "Scale alone doesn't help"). Cite what has moved and where constraints still stand.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—papers claiming prompting *can* overcome defaults, or showing the Assistant Axis is weaker than claimed, or reporting successful persona fidelity without internal modification.
(3) Propose 2 research questions that ASSUME the regime may have shifted: (a) Under what conditions do newer instruction-tuning or preference-optimization methods allow prompt-based persona override? (b) Can joint prompting + lightweight adapter stacking (e.g., LoRA + persona vectors) achieve human-level trait consistency without full retraining?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines