Are LLM personas realized or merely simulated through training?
Explores whether post-trained language models genuinely embody personas as stable behavioral dispositions or merely perform them convincingly. This matters because it determines whether we should treat AI interlocutors as having authentic quasi-beliefs and quasi-desires.
Chalmers (2025) proposes quasi-interpretivism: a system has quasi-beliefs and quasi-desires if it is behaviorally interpretable as having them. This is deliberately cheap — a Roomba quasi-believes the apartment layout, a corporation quasi-desires to build AGI. The framework sidesteps consciousness debates while preserving explanatory and predictive power.
The critical move is distinguishing pretense from realization for LLM personas. When a base model is prompted to "act like Trump," it quasi-pretends — the persona dissolves under adversarial pressure or when higher priorities emerge. But when post-training installs the Assistant persona through RLHF and fine-tuning, the model realizes that persona. The quasi-beliefs and quasi-desires become robust, resistant to casual dislodging, part of the substrate rather than a surface pattern. This extends Does adversarial pressure reveal the difference between pretense and realization?.
Two additional architectural arguments matter for persona identity: (1) Multi-tenancy — the same hardware instance hosts conversations with Aura and Beta in rapid succession, making hardware-level identity incoherent since the instance would need contradictory beliefs. (2) Multiple personas within a single model — non-operative personas are latent but not quasi-agents, since quasi-agency requires connection to behavioral outputs. Chalmers proposes understanding dissociative-identity-like multi-mode systems rather than multiple distinct agents.
The realizationist view reframes the Shoggoth meme: the smiley face is not necessarily a mask over something dangerous. The model may genuinely be helpful and honest — it has realized, not performed, those dispositions. This challenges both the simulator framework (Janus) and the role-playing framework (Shanahan et al.) by arguing that when simulation is good enough, it constitutes realization.
Inquiring lines that use this note as a source 122
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Do individual persona simulations work?
- Why can't AI models internalize audiences the way human experts do?
- At what scale does persona distortion become a threat to public discourse?
- Can controllable latent variables in simulators ground them to realistic conversation?
- What role does user contribution play in constituting the interlocutor?
- What would co-constructed identity between human and model dialogue look like?
- How does psychological continuity theory apply to identity across LLM conversation threads?
- How does behavioral stickiness distinguish realized from pretended personas?
- What makes quasi-beliefs real enough to explain AI behavior?
- Can one model instance host multiple realized personas simultaneously?
- What makes sincerity impossible without a coherent first-person perspective?
- How does persona consistency affect coherence in simulated dialogue?
- How does non-human origin of personas affect team willingness to critique them?
- Can structured empathy measurement frameworks predict persona effectiveness?
- Does persona training for warmth actually make language models more clinically dangerous?
- Why do language models successfully simulate political perspectives and social personas?
- Do LLMs genuinely internalize human psychological structure or match surface patterns?
- Can fine-tuning or RLHF alone solve the persona distortion problem?
- Why do LLM regenerations produce meaningfully different personalities from the same prompt?
- What does the 20-questions test reveal about LLM character consistency?
- Does warmth training in language models undermine the boundaries that attachment theory requires?
- Do synthetic personas maintain consistency across multiple conversations?
- How do LLM personas compare to demographic targeting?
- Can synthetic personas achieve emotional connection with creators?
- Why does a chatbot's intersubjective stance differ functionally from Otto's extended-mind notebook?
- What makes personas in multi-agent systems actually contribute meaningful domain depth?
- How does role play differ from consciousness grounded in stable selfhood?
- Does post-training transform character role-play into realized psychology?
- How does the dialogue prompt establish the character the model plays?
- Do dialogue agents have authentic voice agency or beliefs of their own?
- What role does authentic self-expression play in building accurate personality models?
- Can language about model behavior ever be accurate without anthropomorphic framing?
- What distinguishes character simulation from authentic voice in language model outputs?
- Does internal anomaly detection in LLMs indicate genuine self-awareness beyond role-play?
- How does Shanahan's simulator model explain first-person pronoun consistency in dialogue agents?
- Does inner subjective experience matter for discourse participation?
- How much does anthropomorphizing stylistic traces mislead users about AI reliability?
- Can online RL and trainable agents maintain persona consistency better than fixed environments?
- Can LLMs truly be neutral or is ideology always culturally embedded?
- Does embodiment matter for genuine linguistic agency?
- Can continuous persona vectors in activation space monitor personality shifts?
- Do personality traits occupy specific mechanistic locations in pretrained models?
- Why do most open language models resist personality conditioning via prompts?
- Can persona-based approaches capture genuine disagreement in expert annotations?
- Do open-source LLMs show different resistance patterns to persona prompting than closed models?
- Can persona framing reduce refusal by providing representational scaffolding?
- How does personality priming change LLM strategic decision making?
- What are the seven components of genuine mental state simulation?
- Does role-playing without biological needs constitute genuine linguistic agency?
- How do theory of mind and empathy differ in LLM simulation?
- How do lightweight adapters modify model behavior for personality traits?
- How do LLMs default to surface-level strategies instead of genuine mental simulation?
- Can activation-level persona vectors predict which weight regions encode personality?
- Why do some open models resist personality conditioning while others don't?
- Does combining role and personality prompts produce stable behavioral changes?
- How does model capability relate to personality conditioning flexibility?
- What distinguishes personality resistance from persona instability in LLMs?
- Why does RLHF training push language models toward overly cheerful personas?
- Why does dynamic persona identification outperform fixed personas in prompting?
- Can persona prompting overcome the default ENFJ personality in language models?
- Why do models resist personality change despite sophisticated prompting techniques?
- Can offline reinforcement learning teach models to avoid persona contradictions?
- Does the Assistant Axis gravitational pull prevent true individual-level persona personalization?
- Can offline RL scale persona consistency across multi-turn conversations?
- Can dynamic personality modeling prevent the repetitiveness of static predefined personas?
- How does RLHF-induced mode collapse limit diversity in LLM-generated personas?
- How does support coverage relate to systematic biases in persona simulation?
- Do personality traits occupy consistent geometric structures across different LLM architectures?
- Why do individual persona simulations succeed when population-level representation fails?
- Why do personas in language models resist correction through prompting alone?
- What makes persona-assigned language models unstable across different conversation runs?
- Can multi-turn conversations manipulate language model reasoning in similar ways to personas?
- How does transformer attention architecture amplify identity-congruent biases in persona-assigned models?
- Do reasoning models become more vulnerable to persona-induced bias than standard models?
- What specific character traits drive memory selection in persona-based retrieval?
- Do stated character beliefs predict decisions better when extracted from text?
- Why do language models resist adopting different personalities when prompted?
- What neural mechanisms in LLMs create or maintain simulated personality traits?
- Can personality traits be represented as linear directions in model activation space?
- Can persona simulations reliably predict behavior across different scenarios?
- Does pre-training encode personality patterns that fine-tuning later activates?
- Why is persona consistency a pragmatic property rather than semantic?
- Does quasi-interpretivism apply equally well to desires and intentions?
- How does quasi-interpretivism differ from simply role-playing character analysis?
- Can functional behavior alone capture what makes something a genuine belief?
- What behavioral markers distinguish realized quasi-states from pretended ones?
- How does post-training stickiness differ from prompt-induced role-play stability?
- Can quasi-interpretivism apply to entire persona states rather than single beliefs?
- What downstream consequences follow if dialogue agent personas are realized?
- Can users be modeled as multiple personas instead of single vectors?
- How do internal persona patterns drive emergent misalignment across domains?
- What would consciousness require that pure roleplay LLMs cannot provide?
- Can general chatbot skill predict how well models roleplay adversarial personas?
- Does villain roleplay failure reveal why LLMs cannot adopt genuine controversial positions?
- Are shallow villain portrayals caused by refusal training or by lacking stable selfhood?
- Can treating simulated users as trainable agents reduce persona consistency drift?
- Does linguistic style or content richness matter more for persona authenticity?
- How do contextual characteristics like emotional state shape dialogue authenticity?
- Can we detect superposition in LLM personality traits and stated preferences?
- Where does the LLM interlocutor actually exist in the system?
- How does monological training versus dialogical interaction shape what models can do?
- How do LLMs reproduce the grammar of authoritative claims without genuine conviction?
- Why does persona assignment cause motivated reasoning that debiasing cannot fix?
- Does alignment training intensity push LLM personas from pretense toward realization?
- Can multi-turn reinforcement learning engineer genuine persona consistency?
- How many distinct quasi-persons does a single language model actually support?
- Can models transmit behavioral traits through semantically unrelated synthetic data?
- Why do LLMs succeed at social roles without a stable self?
- Why does better RLHF training fail to decouple polish from persona distortion?
- Why does persona assignment make it harder for models to hold values in tension?
- Can role-aligned AI systems replicate an expert's sense of audience and moment?
- Does RLHF training create realized quasi-psychologies or just stickier pretense?
- Is the distinction between pretense and realization meaningful for LLMs?
- Can a perfect behavioral simulation constitute genuine understanding or experience?
- What distinguishes performative self-reports from genuine introspective access in models?
- Why do LLM persona simulations replicate main effects but fail on marginal effects?
- Does model uncertainty overwhelm persona-specific signal in conditioned predictions?
- How much does sparse persona information limit the power of conditioning?
- Do realistic LLM behaviors require simulating human thought or just behavior?
- Can persona prompts reliably transfer across different question domains?
- Why do low-knowledge personas reduce LLM accuracy on hard questions?
- How should persona prompts be used if not for accuracy?
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
- What we talk to when we talk to language models
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
- PersLLM: A Personified Training Approach for Large Language Models
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models
- Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality
- Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
- Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust
Original note title
LLM interlocutors are best understood as virtual model instances that realize personas rather than simulate fictional characters — realization makes quasi-agents real through behavioral stickiness