Can XAI evaluation include the social layers it currently abstracts away?
This explores whether explainable-AI (XAI) evaluation — which usually scores an explanation as if its quality lived inside the text itself — can be redesigned to measure the social context it currently ignores: who's explaining, to whom, and in what relationship.
This explores whether XAI evaluation can stop treating explanations as standalone artifacts and start measuring the social situation they live in. The corpus's sharpest answer is that the abstraction is the bug: explanation quality isn't intrinsic to the explanation but emerges from a source–framing–recipient triad — who presents it, how it's framed, and what role the recipient plays What if XAI is fundamentally a communication problem?. By that logic, an evaluation that strips away the social layer isn't measuring a clean subset of effectiveness; it's measuring the wrong thing and calling it rigor.
The encouraging news is that other corners of the collection have already built evaluation machinery that puts those social layers back in. SOTOPIA operationalizes social intelligence across seven simultaneous dimensions — goals, believability, knowledge, relationships, social rules, and more — rather than collapsing everything into a single accuracy number Can social intelligence be measured across seven dimensions?. MAJ-EVAL goes further on the 'recipient' side of the triad: it extracts real stakeholder personas from domain documents and runs them through structured debate, so an output is judged from the situated perspectives of the people it actually affects rather than from a generic rubric Can personas extracted from documents generalize across evaluation tasks?. These are existence proofs that the social context can be made measurable and reproducible, not just hand-waved at.
What the framing side teaches is that more social signal isn't automatically better signal. Work on social presence finds that a single primary cue (a voice, an appearance) evokes social response while piling on secondary cues does not — quality of cue beats quantity Do more social cues always make AI feel more present?. For XAI evaluation that's a design constraint: instrumenting 'the social layer' doesn't mean adding twenty new variables, it means identifying the few framing and source cues that actually move how a recipient receives an explanation. And the recipient's response shifts over time — revealing AI authorship first biases people against it, then reverses once they see consistent outcomes Does revealing AI identity help or hurt user trust?. A one-shot evaluation literally cannot see that arc, which is one of the social dynamics being abstracted away.
There's a deeper limit worth knowing, though. A cluster of findings shows AI can predict social norms at superhuman accuracy yet structurally cannot participate in the community processes that create and validate those norms Can AI predict social norms better than humans? Why do AI systems fail at social and cultural interpretation?. The same gap haunts evaluation: a metric can statistically model a stakeholder's reaction without being part of the social meaning-making that legitimizes an explanation. So 'including the social layers' has two ceilings — you can measure situated reception (and the tooling above shows how), but you can't fully simulate participation in it.
The practical doorway, then, is to make the evaluator itself situated. Agent-as-judge systems that gather evidence dynamically cut judgment error roughly a hundredfold over flat LLM-as-judge scoring — but their memory module cascaded errors, a reminder that richer, more social evaluators also introduce new failure surfaces Can agents evaluate AI outputs more reliably than language models?. The takeaway across the collection: yes, XAI evaluation can absorb the social layers it abstracts away, and the components already exist — but doing it trades a clean, brittle number for a messier, truer one.
Sources 8 notes
Explanation quality is not intrinsic to the explanation itself but depends on the rhetorical situation: who presents it, how it is framed, and what role the recipient plays. Evaluations that ignore this triad measure only a narrow slice of real-world effectiveness.
SOTOPIA framework operationalizes social intelligence across Goal, Believability, Knowledge, Secret, Relationship, Social Rules, and Financial dimensions. Humans produce 16.8 words per turn versus GPT-4's 45.5, revealing efficiency as a measurable capability in social interaction.
MAJ-EVAL automatically extracts stakeholder personas from domain documents via semantic clustering and orchestrates structured three-phase debate, achieving reproducible evaluation that transfers across tasks like summarization and dialogue without manual redesign. The approach grounds personas in real stakeholder perspectives rather than arbitrary roles.
Research shows individual primary cues like voice or appearance are sufficient to evoke social-actor presence, while multiple secondary cues cannot. Quality of cues matters more than quantity in driving social responses.
Users initially avoid AI partners when identity is revealed, but this preference reverses after repeated interactions with visible results. The learning mechanism—observing consistent outcomes—is essential; disclosure without feedback produces no calibration.
GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.
LLMs achieve 100th-percentile performance on norm prediction yet regress on theory-of-mind tasks and cannot generate culturally-resonant interpretations. The pattern shows that statistical competence coexists with absence of actual social understanding and participation.
Eight-module agentic evaluation achieved 0.27% judge shift versus 31% for LLM-as-a-Judge on complex tasks. However, the memory module cascaded errors, revealing that agentic systems need error isolation mechanisms to maintain gains.