TOPIC

Role-Play and Persona Behavior

5 synthesis notes · 33 source papers
View as

Why don't LLM role-playing agents act on their stated beliefs?

When LLMs articulate what a persona would do in the Trust Game, their simulated actions contradict those stated beliefs. This explores whether the gap reflects deeper inconsistencies in how language models apply knowledge to behavior.

Explore related Read →

Can AI decompose social reasoning into distinct cognitive stages?

Can breaking down theory-of-mind reasoning into separate hypothesis generation, moral filtering, and response validation stages help AI systems reason about others' mental states more like humans do?

Explore related Read →

Can aligning self-other representations reduce AI deception?

Does training AI models to process self-directed and other-directed reasoning identically reduce deceptive behavior? This explores whether representational alignment inspired by empathy neuroscience could address a fundamental safety problem.

Explore related Read →

Why do reasoning models lose character consistency during role-playing?

When large reasoning models engage in role-playing, they tend to forget their assigned role and default to formal logical thinking. Understanding these failure modes is critical for building character-faithful AI agents.

Explore related Read →

Does safety alignment harm models' ability to roleplay villains?

Exploring whether safety-trained LLMs lose the capacity to convincingly simulate morally compromised characters. This matters because villain fidelity may reveal deeper constraints on how models can adopt any committed, stake-holding perspective.

Explore related Read →

Source papers 33

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.