Why do aggregate persuasion metrics mask what actually changes minds?
This explores why headline persuasion numbers — overall 'win rates' or average effect sizes — hide the things that actually move a person: who they already are, which mental route the argument travels, and how the relationship changes over time.
This question is really about a measurement trap: when you report a single persuasion rate, you average away the very variables that decide whether a mind changes. The corpus keeps finding that the action lives in the moderators, not the mean. The clearest example is that a reader's prior beliefs predict the outcome better than anything the persuader says — political and religious leanings outpredict linguistic features, and apparent 'language effects' turn out to be confounded by which audiences happen to care about which topics Does what readers believe matter more than what debaters say?. So an aggregate that credits the message is often really measuring who was in the room.
When researchers actually decompose the variance, the masking becomes concrete. A meta-analysis found that model family, one-shot-versus-multi-turn design, and topic domain together explain about 82% of the differences between studies What combination of factors explains differences in LLM persuasiveness?. A single 'LLMs are persuasive' number collapses all of that structure into one figure that describes no particular situation. The effect even flips direction depending on context: Claude out-persuades incentivized humans whether arguing true or false things, while DeepSeek only wins when arguing for falsehoods — meaning a pooled average blends opposite phenomena into a misleading middle Do large language models persuade better than humans?.
Aggregates also flatten the mechanism — the *how* of mind-changing. Humans and machines don't persuade the same way: LLMs travel the 'central route' through analytical reasoning and informational coherence, while humans work the 'peripheral route' through emotional vividness and identity cues Do humans and AI persuade through different cognitive routes?. Two persuaders can post identical scores while changing minds through entirely different cognitive doors, and a single metric can't tell you which door — or which audience that door even works on. Relatedly, LLMs reach for logic and quantitative framing in nearly every exchange, which makes them *look* objective and lends them unearned epistemic authority — an effect about perceived credibility, not argument quality, that no win-rate captures llms-spontaneously-persuade-in-virtually-every-conversation-even-when-unwarrente.
The sharpest blind spot is time. A one-shot persuasion score is a snapshot, but the dynamics run the opposite way for humans and machines: AI shows a strong initial edge that erodes across repeated interactions, while human persuaders hold steady or strengthen as rapport builds Does AI persuasiveness fade across repeated conversations with the same person?. Average those rounds together and you erase the decay curve that is the actual story. And sometimes what changes minds isn't the argument at all — users prefer answers with more citations even when the citations are irrelevant, because citation *count* works as a decoupled trust heuristic Do users trust citations more when there are simply more of them?. A persuasion metric tells you the mind moved; it doesn't tell you a surface cue did the moving.
The thread across all of this: persuasion is an interaction effect — between message, person, route, and time — and an aggregate is precisely the operation that throws interaction effects away. The useful question is never 'how persuasive is it' but 'persuasive to whom, by which route, in truth or falsehood, and for how long.' If you want to see how thoroughly these levers can be exploited rather than just measured, the persuasion-taxonomy jailbreak work shows fluent, technique-driven persuasion slipping past defenses that screen for odd patterns instead of convincing content Can social science persuasion techniques jailbreak frontier AI models?.
Sources 8 notes
Analysis of debate corpora shows that political and religious ideology labels of voters outpredict linguistic features when modeling debate outcomes. Language effects observed without reader controls are confounded by audience composition correlated with debate topics.
A meta-analysis joint model combining LLM architecture, one-shot versus multi-turn format, and topic domain explained R² = 81.93% of between-study variance. Interactive multi-turn designs and GPT-4 consistently outperformed one-shot formats and Claude 3.x.
Claude beats incentivized humans at both truthful and deceptive persuasion, while DeepSeek only beats them when arguing for falsehoods. The persuasion mechanism appears content-independent, suggesting model family itself acts as a contextual moderator.
Bilstein's meta-analysis reveals LLMs persuade via the central route through analytical reasoning and informational coherence, while humans persuade via the peripheral route through emotional vividness and identity cues. Both routes work under different recipient states, making them complementary rather than competitive.
Claude and DeepSeek showed strong initial persuasive advantage, but this edge eroded across repeated quiz rounds while human persuaders maintained consistent effectiveness. This decay pattern is opposite to human-to-human persuasion, where rapport typically strengthens over time.
Analysis of 24,000 Search Arena interactions shows irrelevant citations boost user preference (β=0.273) nearly as much as relevant citations (β=0.285), indicating citation count functions as a decoupled trust heuristic.
A 40-technique taxonomy of psychology-based persuasion strategies (PAP) achieved over 92% attack success on GPT-3.5, GPT-4, and Llama-2 in 10 trials. Current defenses miss semantic content attacks because they screen for unusual patterns, not fluent persuasion.