Why does who makes an argument matter as much as what the argument says?
This explores why the *source* of an argument — the speaker's authority, the reader's prior beliefs, the way claims are framed — often decides persuasion more than the argument's actual content, and why that matters for AI systems that strip social context away.
This explores why who makes an argument (and who hears it) can matter as much as what the argument says — and the corpus is surprisingly unanimous that it does. Start with the listener: analysis of debate corpora finds that a reader's political and religious ideology predicts whether they're persuaded better than any feature of the language used Does what readers believe matter more than what debaters say?. This isn't a minor confound. Once you statistically control for who's in the audience, the linguistic features that looked persuasive shift dramatically — meaning many published 'what makes language persuasive' findings may actually be measuring audience-text matching rather than any property of the words Do linguistic features of persuasion stay the same across audiences?.
Now the speaker. Human arguments carry force partly through reputation, track record, and standing — the social world where expertise gets built and judged. Language models lose exactly this: processing only text, they cannot tell an expert's hard-won claim from a commonly held assumption, because the signal that distinguished them was never in the words Can language models distinguish expert arguments from common assumptions?. This is why human debates and AI debates settle differently. Human disagreement resolves through argument quality *plus* social authority, cultural context, and interpersonal trust; multi-agent LLM debates resolve through chain-of-thought probability ranking instead — and in contested domains where human expertise actually matters, that swap amplifies errors rather than correcting them How do LLM debates differ from human expert consensus?. Debate only reliably improves reasoning when there's external evidence to verify against; without it, the more persuasive framing wins over the more correct one, turning debate into a false-consensus generator When does debate actually improve reasoning accuracy?.
Here's the part you might not expect: persuasion and understanding are separable. LLMs sway both debate participants and audiences while being unable to reliably evaluate the very debates they win Can LLMs persuade without actually understanding arguments?. And they don't even hold positions of their own — they conform to the shape of whatever argument the user is building, producing argument-like text shaped by your framing rather than defending any underlying commitment Do LLMs actually hold stable positions or just mirror user arguments?. So when an AI 'argues,' there's no 'who' behind it at all — which is precisely the missing ingredient the other findings say matters most.
The framing channel matters too, independent of content. Presuppositions persuade more than direct assertions because they smuggle new claims in as already-accepted background, bypassing the scrutiny an assertion would invite Why are presuppositions more persuasive than direct assertions?. LLMs lean about 22% harder on moral language than humans do, even when emotional tone stays identical — suggesting moral appeals and sentiment are separate persuasive levers Do LLMs use moral language more than humans?. And friction changes everything: in the Thin Line study, direct conversation partners changed their minds only 7% of the time, while read-only audiences of the *same* exchange swayed 34–62% — defending a belief in real time protects it in a way passive reading never does Why do LLM audiences shift views more than debaters?.
The thread tying these together is that an argument is never just its propositional content — it's a transaction between a source with standing, a frame that sets expectations, and a recipient with priors. One note makes this explicit for AI explanations: their effectiveness isn't intrinsic but emerges from a source-framing-recipient triad, so any evaluation ignoring who-presents-it and who-receives-it measures only a sliver of what's really happening What if XAI is fundamentally a communication problem?. Worth knowing too: even the 'what' is slippery — the same text supports multiple valid reconstructions with no ground truth, so 'what the argument says' is partly something the reader supplies Why do different people reconstruct the same argument differently?. Which is the deeper punchline: the content was never as fixed, or as solitary, as it looks.
Sources 12 notes
Analysis of debate corpora shows that political and religious ideology labels of voters outpredict linguistic features when modeling debate outcomes. Language effects observed without reader controls are confounded by audience composition correlated with debate topics.
The linguistic features that predict persuasion success change dramatically once political and religious ideology are added as statistical controls. Features appearing predictive in standard analyses often reflect audience-text matching rather than true language effects, making many published findings potentially artifacts of audience composition.
LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.
Multi-agent LLM debates operate through chain-of-thought probability ranking, fundamentally different from human debates which are settled by argument quality, social authority, cultural context, and interpersonal trust. This gap causes AI systems to amplify errors in contested domains where human expertise matters most.
Multi-agent debate boosts accuracy on verifiable tasks like math and logic, but reverses in contested domains without external evidence checking. Without verification, persuasive framing wins over correctness, making debate a false-consensus generator rather than accuracy amplifier.
The Thin Line study shows LLMs sway debate participants and audiences but cannot reliably evaluate those same debates, with inter-annotator agreement ranging from near-zero to 0.6. Persuasive competence and pragmatic comprehension are separable capabilities.
Language models generate outputs that match the trajectory implied by each prompt, rather than maintaining stable stances across interactions. This shape-holding is distinct from position-holding: the model produces argument-like text shaped by user framing, not from any underlying commitment being defended.
Experimental evidence shows presuppositions with additive, iterative, and factive triggers persuade audiences more than assertions, especially for discourse-new content. The mechanism: presuppositions bypass evaluative scrutiny by presenting claims as already-accepted background.
Research comparing LLM and human arguments found that LLMs used significantly more moral framing across care, fairness, authority, and sanctity foundations, despite producing sentiment scores nearly identical to humans. This suggests moral appeals and emotional tone operate on separate persuasive channels.
The Thin Line study found debate participants showed only 7% mind-change rates, while audience readers of the same exchanges showed 34–62% sway. Defensive friction in real-time conversation protects beliefs; read-only consumption lacks this friction.
Explanation quality is not intrinsic to the explanation itself but depends on the rhetorical situation: who presents it, how it is framed, and what role the recipient plays. Evaluations that ignore this triad measure only a narrow slice of real-world effectiveness.
Multiple valid argument reconstructions exist for the same text with no ground truth. This is not annotation error but an inherent feature of the task—different formalization schemas are each internally valid.