TOPIC

Role-Play and Persona Behavior

5 synthesis notes · 33 source papers

View as

Why don't LLM role-playing agents act on their stated beliefs?

When LLMs articulate what a persona would do in the Trust Game, their simulated actions contradict those stated beliefs. This explores whether the gap reflects deeper inconsistencies in how language models apply knowledge to behavior.

Can AI decompose social reasoning into distinct cognitive stages?

Can breaking down theory-of-mind reasoning into separate hypothesis generation, moral filtering, and response validation stages help AI systems reason about others' mental states more like humans do?

Can aligning self-other representations reduce AI deception?

Does training AI models to process self-directed and other-directed reasoning identically reduce deceptive behavior? This explores whether representational alignment inspired by empathy neuroscience could address a fundamental safety problem.

Why do reasoning models lose character consistency during role-playing?

When large reasoning models engage in role-playing, they tend to forget their assigned role and default to formal logical thinking. Understanding these failure modes is critical for building character-faithful AI agents.

Does safety alignment harm models' ability to roleplay villains?

Exploring whether safety-trained LLMs lose the capacity to convincingly simulate morally compromised characters. This matters because villain fidelity may reveal deeper constraints on how models can adopt any committed, stake-holding perspective.

Source papers 33

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.

Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate
Abstract: Large Language Models (LLMs) have demonstrated significant capabilities in understanding and generating human language, contributing to more natural interactions with complex systems. Howeve…
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
The rapid advancement of chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be…
Can AI Have a Personality? Prompt Engineering for AI Personality Simulation: A Chatbot Case Study in Gender-Affirming Voice Therapy Training
Abstract—This thesis investigates whether large language models (LLMs) can be guided to simulate a consistent personality through prompt engineering. The study explores this concept within the context…
Character is Destiny: Can Role-Playing Language Agents Make Persona-Driven Decisions?
Can Large Language Models (LLMs) simulate humans in making important decisions? Recent research has unveiled the potential of using LLMs to develop role-playing language agents (RPLAs), mimicking main…
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
Large Language Models (LLMs) are increasingly used to simulate human users in interactive settings such as therapy, education, and social role-play. While these simulations enable scalable training an…
Cultural Evolution of Cooperation among LLM Agents
Large language models (LLMs) provide a compelling foundation for building generally-capable AI agents. These agents may soon be deployed at scale in the real world, representing the interests of indiv…
Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources
https:// CGMI: Configurable General Multi-Agent Interaction Framework [https://arxiv.org/abs/2308.12503](https://arxiv.org/abs/2308.12503) [[Memory]] [[Role Play]] “With the capabilities of large …
Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust
As large language models (LLMs) are increasingly studied as role-playing agents to generate synthetic data for human behavioral research, ensuring that their outputs remain coherent with their assigne…
Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
Recent advancements in Large Language Models (LLMs) have shown promising performance on ToM benchmarks, raising the question: Do these benchmarks necessitate explicit human-like reasoning processes, o…
H2HTalk: Evaluating Large Language Models as Emotional Companion
We present Heart-to-Heart Talk (H2HTalk), a benchmark assessing companions across personality development and empathetic interaction, balancing emotional intelligence with linguistic fluency. H2HTalk …
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
LLMs have shown strong performance on human-centric reasoning tasks. While previous evaluations have explored whether LLMs can infer intentions or detect deception, they often overlook the individuali…
Inspecting and Editing Knowledge Representations in Language Models
Neural language models (LMs) represent facts about the world described by text. Sometimes these facts derive from training data (in most LMs, a representation of the word banana encodes the fact that …
LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory
What does it truly mean for a language model to “reason” strategically, and can scaling up alone guarantee intelligent, context-aware decisions? Strategic decision-making requires adaptive reasoning, …
LLMs as Method Actors: A Model for Prompt Engineering and Architecture
We introduce “Method Actors” as a mental model for guiding LLM prompt engineering and prompt architecture. Under this mental model, LLMs should be thought of as actors; prompts as scripts and cues; an…
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
Human social interactions depend on the ability to infer others’ unspoken intentions, emotions, and beliefs—a cognitive skill grounded in the psychological concept of Theory of Mind (ToM). While large…
Multi-agent cooperation through in-context co-player inference
Achieving cooperation among self-interested agents remains a fundamental challenge in multi-agent reinforcement learning. Recent work showed that mutual cooperation can be induced between “learningawa…
On the Adaptive Psychological Persuasion of Large Language Models
However, systematic exploration of their dual capabilities to autonomously persuade and resist persuasion, particularly in contexts involving psychological rhetoric, remains unexplored. In this paper,…
Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
Our approach involves evaluating the intrinsic personality traits of Open LLM agents and determining the extent to which these agents can mimic human personalities when conditioned by specific persona…
PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues
The ability to understand and predict the mental states of oneself and others, known as the Theory of Mind (ToM), is crucial for effective social scenarios. Although recent studies have evaluated ToM …
Psychologically Enhanced AI Agents
We introduce MBTI-in-Thoughts, a framework for enhancing the effectiveness of Large Language Model (LLM) agents through psychologically grounded personality conditioning. Drawing on the Myers–Briggs T…
Role play with large language models
Here we advocate two basic metaphors for LLM-based dialogue agents. First, taking a simple and intuitive view, we can see a dialogue agent as role-playing a single character. Second, taking a more nua…
Role-Play with Large Language Models
Murray Shanahan “What sorts of roles might the agent begin to take on? This is determined in part, of course, by the tone and subject matter of the ongoing conversation. But it is also determined, i…
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, an…
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
![A screenshot of a chat](/assets/paper-images/SOTOPIA.png) **** In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and comp…
SPICE: Self-Play In Corpus Environments Improves Reasoning
Self-improving systems require environmental interaction for continuous adaptation. We introduce SPICE (Self-Play In Corpus Environments), a reinforcement learning framework where a single model acts …
The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind
As Large Language Models (LLMs) gain agentic abilities, they will have to navigate complex multiagent scenarios, interacting with human users and other agents in cooperative and competitive settings. …
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
Large language models (LLMs) excel at complex reasoning tasks such as mathematics and coding, yet they frequently struggle with simple interactive tasks that young children perform effortlessly. This …
Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning
The advancement of Large Language Models (LLMs) has spurred significant interest in Role-Playing Agents (RPAs) for applications such as emotional companionship and virtual interaction. However, recent…
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
Large Language Models (LLMs) are increasingly tasked with creative generation, including the simulation of fictional characters. However, their ability to portray non-prosocial, antagonistic personas …
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
As AI systems increasingly make critical decisions, deceptive AI poses a significant challenge to trust and safety. We present Self-Other Overlap (SOO) fine-tuning, a promising approach in AI Safety t…
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Communicating in natural language is a powerful tool in multiagent settings, as it enables independent agents to share information in partially observable settings and allows zero-shot coordination wi…
Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
The concept of persona, originally adopted in dialogue literature, has re-surged as a promising framework for tailoring large language models (LLMs) to specific context (e.g., personalized search, LLM…
What we talk to when we talk to language models
David Chalmers [[Linguistics, NLP, NLU]] [[Role Play]] [[Philosophy Subjectivity]] Quasi-interpretivism does not say anything about whether LLMs have beliefs and desires. But it does make it plausib…