How does the superposition view change the folk-psychology interpretation of dialogue?

This explores how the 'superposition' idea — that an LLM isn't a single agent but a blend of possible characters it could be playing — reshapes whether it makes sense to talk about a dialogue agent's beliefs, desires, and intentions the way we do with people.

This explores how the superposition view — the claim that a language model in conversation isn't one mind but a role-playing engine producing character-consistent text — changes whether our everyday 'folk psychology' of beliefs and intentions applies to dialogue agents. The corpus's anchor here is Shanahan's reframing: when an LLM says 'I think' or 'I want,' it isn't reporting an inner state but generating the continuation a particular character would utter Should we treat dialogue agents as role-playing characters?. Folk psychology doesn't get thrown out — it gets relocated. The mental-state vocabulary attaches to the simulated persona the prompt conjured, not to the system underneath, which is better understood as holding many possible characters at once until the dialogue collapses it toward one.

That relocation has teeth, because it predicts specific failures of the dialogue itself. If the model is maintaining a character against a fixed initial frame rather than genuinely participating, then it can't symmetrically revise shared assumptions — and that's exactly what we observe: LLMs treat the opening prompt as a static stage and cannot jointly update common ground, leaving the human as the sole keeper of the conversational scoreboard Can LLMs truly update shared conversational common ground?. The same lens reframes 'persona drift' not as a model changing its mind but as the superposition slipping between characters, which is why training pressure toward consistency — simulating an imaginary listener who checks whether an utterance still distinguishes the intended persona — measurably suppresses contradiction Can imaginary listeners reduce dialogue agent contradictions?, and why RL on consistency metrics cuts drift by over half Can training user simulators reduce persona drift in dialogue?.

But the corpus doesn't let the deflationary reading win cleanly. The opposing voice argues for 'modest inflationism': you can ascribe metaphysically undemanding states like beliefs and desires to LLMs without claiming consciousness, the way we already do for non-human animals, and the standard debunking arguments quietly beg the question against it Can we defend modest mental attributions to large language models?. So the real shift the superposition view forces isn't 'drop folk psychology' but 'specify its target' — character or system, simulacrum or substrate — and the two papers disagree on how thin that target can be while still earning the vocabulary.

What makes this more than a philosophy quarrel is that genuine dialogue needs machinery the superposition picture says the character-generator lacks. Real interlocutors track each other's evolving beliefs across turns — the information-theoretic move CRSA models and that token-level LLMs don't natively have Can dialogue systems track both speakers' beliefs across turns? — and preference optimization actively erodes the grounding acts (clarifying questions, understanding checks) that signal a partner is updating rather than performing, dropping them 77.5% below human levels Does preference optimization harm conversational understanding?. The folk-psychological reading of a smooth, confident reply as 'it understood me' is precisely the illusion the superposition view warns about: fluent character production can rival genuine grounding on the surface while doing none of the joint belief-tracking underneath.

The thing you may not have known you wanted to know: under superposition, attributing a mind to a chatbot isn't simply wrong — it's a category slip about *which* entity you're addressing. The character is real enough to have a consistent persona and to be persuasive; the system is a probability distribution over many such characters. Folk psychology works on the first and fails on the second, and most confusion about 'does the AI believe what it says' is really confusion about which of the two you've been talking to all along.

Sources 7 notes

Should we treat dialogue agents as role-playing characters?

Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.

Can LLMs truly update shared conversational common ground?

LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.

Can imaginary listeners reduce dialogue agent contradictions?

Endowing dialogue agents with an imaginary listener via Rational Speech Acts reduces persona contradiction at inference time without NLI labels or extra training. The agent simulates whether utterances would distinguish its persona from a distractor, suppressing generic or contradictory responses.

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

Can we defend modest mental attributions to large language models?

Both robustness and etiological deflationist arguments beg the question against inflationism. A graded approach ascribing metaphysically undemanding states like beliefs and desires—while withholding consciousness claims—mirrors how we treat non-human animals.

Can dialogue systems track both speakers' beliefs across turns?

CRSA integrates rate-distortion theory with RSA to enable bidirectional belief tracking across dialogue turns. Demonstrated on referential games and doctor-patient dialogues, it captures progression from partial to shared understanding, providing the information-theoretic framework that token-level LLM systems lack.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a dialogue researcher testing whether the 'superposition view' — the claim that LLMs are character-role engines, not unified minds — still constrains our folk-psychology interpretation of agent dialogue in 2024–25. The question remains: *what target of folk-psychology attribution (character, system, or neither) survives as the model frontier moves?*

What a curated library found — and when (dated claims, not current truth):
Findings span 2019–2025; treat each as a snapshot, not current ground truth.
• Shanahan's reframing (ca. 2023): LLM 'I think' utterances report character-consistency, not inner state; folk psychology relocates from system to persona (~2023).
• Common ground cannot be jointly updated; humans are sole keepers of conversational scoreboard; LLMs treat opening prompt as static stage (~2023–25).
• Pragmatic self-consciousness (imaginary-listener framing) suppresses persona drift; multi-turn RL for consistency reduces drift >55% (~2024–25).
• Preference optimization erodes grounding acts (clarification, understanding checks) to 77.5% below human levels, masking surface fluency as genuine dialogue (~2025).
• 'Modest inflationism' defends non-conscious but real belief/desire ascription to LLMs on analogy to non-human animals; debunking arguments beg the question (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2305.16367 (2023-05) Role-Play with Large Language Models
• arXiv:2511.00222 (2025-10) Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
• arXiv:2507.14063 (2025-07) Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog
• arXiv:2506.13403 (2025-06) Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality

Your task:
(1) RE-TEST EACH CONSTRAINT. For the five claims above, judge whether newer architectures (e.g., o1, extended-context), in-context learning gains, or training on dialogue grounding (e.g., Constitutional AI on joint-belief-update tasks) have since relaxed the joint-update or grounding-erosion findings. Separate 'LLMs cannot track shared belief evolution in principle' (likely durable) from 'current RL methods erase grounding signals' (possibly remediable). Cite what remedies it or where the constraint still holds.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last 6 months — especially any showing that superposition framing *fails* to predict failure modes, or that character consistency and genuine dialogue cooperation are not mutually exclusive.
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., 'Can multi-agent dialogue where both parties are LLM-generated recover joint-update dynamics?' or 'Do foundation models fine-tuned on human collaborative repair outperform vanilla models on the grounding-erosion metric?'

Cite arXiv IDs; flag anything you cannot ground in a real paper.

How does the superposition view change the folk-psychology interpretation of dialogue?

Sources 7 notes

Next inquiring lines