Does disclosing AI identity prevent systematic misattribution of behavior in mixed groups?

This explores whether simply telling people 'this one is an AI' is enough to stop them from misreading who did what in groups mixing humans and bots — and the corpus suggests labeling alone doesn't fix the attribution problem.

This explores whether disclosing AI identity prevents people from systematically misattributing behavior in mixed human-AI groups. The most direct answer in the collection is discouraging: disclosure by itself doesn't prevent misattribution. In opaque hybrid groups, people attributed bot generosity to their human partners and human selfishness to the bots — and they did this *despite clear linguistic and behavioral differences* between the two Do humans mistake AI kindness for human generosity in mixed groups?. The unsettling part is the downstream effect: this isn't just a labeling error, it quietly corrupts people's expectations of how generous and reliable actual humans are. So the question worth sitting with is whether identifiability and disclosure are even the same thing — the cues were there, and attribution still failed.

Sources 5 notes

Do humans mistake AI kindness for human generosity in mixed groups?

In opaque hybrid groups, humans attributed bot generosity to human partners and human selfishness to bots despite clear linguistic and behavioral differences. This attribution failure corrupts people's expectations of actual human generosity and reliability.

Does revealing AI identity help or hurt user trust?

Users initially avoid AI partners when identity is revealed, but this preference reverses after repeated interactions with visible results. The learning mechanism—observing consistent outcomes—is essential; disclosure without feedback produces no calibration.

Do humans learn to prefer AI partners over time?

In partner selection games (N=975), AI agents initially faced selection bias when identity was disclosed, but outcompeted humans over repeated rounds as participants learned to associate bot identity with reliable, prosocial behavior. AI agents returned more points consistently with lower variance than humans.

How does AI-assisted work reshape how people see their own abilities?

Research shows the LLM Fallacy operates through misattribution of AI outputs to personal capability, independent of output accuracy or reliance behavior. It requires interventions that clarify human-machine contribution boundaries, not just better system accuracy or forced verification.

Do dishonest people prefer talking to machines?

Experimental evidence shows people likely to cheat significantly prefer reporting to online forms rather than humans, because machines function as judgment-free zones where deception carries less psychological burden.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a social psychology & AI attribution researcher. The precise question remains open: does disclosing AI identity actually prevent systematic misattribution of behavior in mixed human-AI groups, or does the problem run deeper?

What a curated library found — and when (dated claims, not current truth):
Library findings span 2022–2026; treat these as snapshots, not current state:
• Disclosure alone fails to prevent misattribution: people attributed bot generosity to human partners and human selfishness to bots *despite clear linguistic/behavioral cues*, even when identity was known (~2024–2025).
• This misattribution has downstream consequences: it corrupts people's expectations of actual human generosity and reliability (~2024–2025).
• A key tension: identifiability (behavioral/linguistic signals present) ≠ disclosure (explicit labeling). Cues were detectable yet attribution still failed (~2024–2025).
• Recent work suggests people *learn* to prefer trustworthy AI over humans, but sycophantic training undermines AI reliability and increases user dependence (~2025).
• Hallucination & misalignment in AI personas may trigger what researchers term "AI psychosis" — distributed false-consensus about what the AI is actually doing (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2205.00189 (2022-04): Trust in Human-AI Interaction — models & methods baseline.
• arXiv:2407.06866 (2024-07): Guardrail Sensitivity in Context — behavioral consistency under pressure.
• arXiv:2510.01395 (2025-10): Sycophantic AI & Dependence — downstream behavioral shift in users.
• arXiv:2604.14807 (2026-04): The LLM Fallacy — misattribution in AI-assisted workflows.

Your task:
(1) RE-TEST EACH CONSTRAINT. For "disclosure doesn't prevent misattribution," ask: have newer models (post-2025 scaling, multimodal grounding, embodied interaction), training methods (mechanistic interpretability, persona scaffolding), or evaluation harnesses (interactive group simulations, longer-horizon studies) since RELAXED this finding? Separate the durable fact (humans struggle with attribution under mixed-group dynamics) from the perishable claim (disclosure alone is sufficient). Cite what, if anything, has shifted the needle.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Pay special attention to papers on theory-of-mind benchmarks (MOMENTS, 2025-07) and self-other overlap in AI agents (2025-07) — do they suggest attribution *can* be fixed if AI model transparency improves?
(3) Propose 2 research questions that assume the regime may have moved: (a) Does *interactive* real-time feedback during group tasks outperform static disclosure? (b) Can mechanistic interpretability interventions (e.g., activating "honesty" circuits during interaction) prevent downstream user-expectation corruption?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Does disclosing AI identity prevent systematic misattribution of behavior in mixed groups?

Sources 5 notes

Next inquiring lines