Can persuasion research measure language effects without confounding them with audience composition?

This explores whether the apparent 'language effects' in persuasion studies are really about word choices at all — or whether they're an artifact of who happened to be in the audience.

This explores whether persuasion research can cleanly measure what language does, separate from who's listening — and the corpus suggests the honest answer is: often not, unless you control for the audience first. The sharpest finding here is that a reader's prior beliefs predict whether they're persuaded better than any feature of the language used to persuade them Does what readers believe matter more than what debaters say?. Because debate topics tend to attract ideologically matched audiences, any 'language effect' you measure without controlling for who showed up is confounded by audience composition correlated with the topic itself.

The consequence is unsettling for the published literature. When you add political and religious ideology as statistical controls, the specific linguistic features that looked most persuasive *change* — meaning many findings about 'what makes language persuasive' may actually be measuring audience-text matching rather than the language doing any work Do linguistic features of persuasion stay the same across audiences?. The same words look powerful or weak depending entirely on whether the room already agreed. So the methodological answer to the question is conditional: yes, you can isolate language effects, but only once reader priors are explicitly modeled — and the moment you do, a lot of the apparent signal evaporates.

What makes this worth knowing is how the same confounding logic shows up one level higher, in the LLM-vs-human persuasion debate. Headline claims that 'AI is more persuasive than humans' collapse to a statistical null when pooled across studies Are language models actually more persuasive than humans? — persuasiveness turns out to be conditional on context, not an inherent property of the speaker. And when researchers built a model that *did* control for the right moderators — model family, one-shot vs. multi-turn design, topic domain — those three factors alone explained 82% of the variance between studies What combination of factors explains differences in LLM persuasiveness?. In other words, the 'who and where' swamps the 'what was said,' exactly the audience-composition problem from the debate corpora rewritten for AI.

The corpus does point to language effects that survive controls — but notice they're framed mechanistically rather than as raw feature correlations. LLMs persuade partly through linguistically expressed *conviction*, an assertive register installed by RLHF that correlates with persuasion regardless of whether the claim is true Does linguistic conviction explain why LLMs persuade more effectively?. Presuppositions outperform direct assertions because they smuggle claims in as already-accepted background, bypassing scrutiny Why are presuppositions more persuasive than direct assertions?. These read as genuine language effects precisely because they name a causal mechanism, not just a feature that happened to track an audience.

The lesson the corpus leaves you with: persuasion is a property of the *interaction*, not the message. AI's persuasive edge decays across repeated conversations with the same person while humans hold steady Does AI persuasiveness fade across repeated conversations with the same person?, and persuasive success can be fully dissociated from whether the persuader even understands the argument Can LLMs persuade without actually understanding arguments?. Any study that measures language without measuring the audience, the format, and the relationship over time is, in effect, measuring its own sampling.

Sources 8 notes

Does what readers believe matter more than what debaters say?

Analysis of debate corpora shows that political and religious ideology labels of voters outpredict linguistic features when modeling debate outcomes. Language effects observed without reader controls are confounded by audience composition correlated with debate topics.

Do linguistic features of persuasion stay the same across audiences?

The linguistic features that predict persuasion success change dramatically once political and religious ideology are added as statistical controls. Features appearing predictive in standard analyses often reflect audience-text matching rather than true language effects, making many published findings potentially artifacts of audience composition.

Are language models actually more persuasive than humans?

A meta-analysis of 7 studies with 17,422 participants found no detectable difference in persuasive effectiveness between LLMs and humans (Hedges' g = 0.02). Persuasiveness appears conditional on context rather than speaker category.

What combination of factors explains differences in LLM persuasiveness?

A meta-analysis joint model combining LLM architecture, one-shot versus multi-turn format, and topic domain explained R² = 81.93% of between-study variance. Interactive multi-turn designs and GPT-4 consistently outperformed one-shot formats and Claude 3.x.

Does linguistic conviction explain why LLMs persuade more effectively?

Linguistic analysis shows LLMs express higher conviction than human persuaders, and this confidence-loading directly correlates with persuasive outcomes regardless of whether claims are true or false. RLHF training installs an assertive register that functions as a content-independent persuasion amplifier.

Why are presuppositions more persuasive than direct assertions?

Experimental evidence shows presuppositions with additive, iterative, and factive triggers persuade audiences more than assertions, especially for discourse-new content. The mechanism: presuppositions bypass evaluative scrutiny by presenting claims as already-accepted background.

Does AI persuasiveness fade across repeated conversations with the same person?

Claude and DeepSeek showed strong initial persuasive advantage, but this edge eroded across repeated quiz rounds while human persuaders maintained consistent effectiveness. This decay pattern is opposite to human-to-human persuasion, where rapport typically strengthens over time.

Can LLMs persuade without actually understanding arguments?

The Thin Line study shows LLMs sway debate participants and audiences but cannot reliably evaluate those same debates, with inter-annotator agreement ranging from near-zero to 0.6. Persuasive competence and pragmatic comprehension are separable capabilities.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst auditing persuasion measurement methodology. The question remains: can we isolate language effects from audience composition, or are they inseparable?

What a curated library found — and when (dated claims, not current truth):
Findings span 2019–2026. Key constraints documented:
• Reader prior beliefs predict persuasion outcomes better than linguistic features; controlling for ideology shifts which linguistic predictors survive (2019–2024).
• Pooled LLM-vs-human persuasion effect is statistically null; model family, interaction design, and topic domain explain 82% of between-study variance (2024–2025).
• LLM persuasive advantage is mediated by linguistically expressed conviction (RLHF-installed register), not inherent truthfulness (2024–2025).
• Presuppositions outperform direct assertions by smuggling claims as background; LLM persuasiveness wanes over repeated interactions with the same person (2025).
• LLMs can persuade successfully while failing to comprehend argument structure; they struggle rejecting false presuppositions under high-stakes misinformation (2025).

Anchor papers (verify; mind their dates):
• arXiv:1906.11301 (2019) — prior beliefs confound linguistic signals
• arXiv:2404.09329 (2024) — cognitive effort and design moderate LLM persuasiveness
• arXiv:2505.09662 (2025) — when LLMs exceed humans; interaction history matters
• arXiv:2507.01936 (2025) — comprehension and persuasion dissociate

Your task:
(1) RE-TEST THE CORE CONSTRAINT. Has newer instrumentation (e.g., causal intervention, instrumental variables, prospective audience sampling before topic assignment) since RELAXED the confounding? Can any recent work (last 6 months) cleanly separate language effect from audience-text matching without pre-controls? If the constraint still holds, identify what would falsify it.
(2) Surface contradicting work: any papers claiming language effects DO survive controls, or showing persuasion is NOT primarily an interaction property.
(3) Propose 2 questions assuming the regime has moved: (a) If audience composition is inseparable, how should persuasion science reframe success metrics away from "what language works" to "when and for whom"? (b) Given LLMs persuade while failing comprehension, what does that imply about the role of linguistic form vs. listener inference in persuasion?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can persuasion research measure language effects without confounding them with audience composition?

Sources 8 notes

Next inquiring lines