Who bears responsibility when AI seems human-like?
Does human-likeness in AI come from how users perceive systems or how designers build them? Understanding this distinction clarifies where accountability lies when AI causes harm.
Does human-likeness in AI come from how users perceive systems or how designers build them? Understanding this distinction clarifies where accountability lies when AI causes harm.
Explores whether keeping humans actively involved in AI research collaboration accelerates paradigm discovery compared to fully autonomous self-improvement, and what safety advantages this preserves.
Can AI systems be designed to understand users, act transparently, and share mental models with humans? This explores whether current scaling approaches miss cognitive requirements for genuine partnership.
Does a single user reading an explanation create its meaning, or does meaning emerge from the social layers surrounding that reading—colleagues' interpretations, organizational norms, public discourse?
Most factuality work expands what models know rather than what they know they know. Can expressing calibrated uncertainty create a third path between confident errors and unhelpful abstention?
Explores whether perspective-taking ability—the capacity to model another's cognitive state—differentiates humans who benefit most from working with AI, separate from solo problem-solving skill.
Explores whether human-centered concerns like safety and fairness work better as early design principles throughout development, or as post-training alignment patches. Matters because pipeline placement determines whether human priorities shape the foundation or fight against it.
If harm and benefit depend on who you ask and how you measure them, can we design LLM systems that satisfy all stakeholders? This explores why broad values like safety and justice resist one-size-fits-all implementation.
LLM-based user simulators drift away from assigned goals during multi-turn conversations, producing unreliable reward signals for agent training. Understanding this goal misalignment problem is critical because it undermines the entire RL training pipeline.
When do human cognitive shortcuts fail in AI interaction? Three compounding traps—treating statistical patterns as facts, mistaking fluency for understanding, and avoiding disagreement—may explain systematic overreliance across languages and contexts.
Do the three classical rhetorical appeals—logical alignment, source credibility, and emotional framing—operate simultaneously in how we explain AI systems to users? And can naming these channels help designers make intentional rhetorical choices?
Gricean models assume good-faith rational agents coordinating meaning. But do AI systems designed to persuade—using credibility, emotion, and non-rational appeals—really operate under these assumptions? What happens when we drop the rationality premise?
Rhetorical strategies used to justify appropriate AI adoption rely on the same persuasion mechanisms as dark patterns. Without observable intent, explanation and manipulation look identical—raising urgent questions about how to audit XAI systems responsibly.
Most XAI work treats explanations as neutral descriptions of model behavior, but they may actually be doing persuasive work to justify AI adoption. What happens when we acknowledge this rhetorical function?
Does explanation effectiveness depend on who delivers it, how it's framed, and who uses it? This challenges the dominant technical view that treats explanations as context-independent outputs.
The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.