Are language models actually more persuasive than humans?
Does the research evidence support claims that LLMs persuade more effectively than humans, or have we been cherry-picking studies to fit a narrative?
The Bilstein 2025 meta-analysis is the corrective to a literature that had been read selectively in both directions. Pooling 7 studies covering 17,422 participants, the random-effects estimate is Hedges' g = 0.02 (p = .53, 95% CI [-0.048, 0.093]). There is no detectable average difference between LLM and human persuasiveness. Egger's test flagged potential small-study effects but trim-and-fill imputed no missing studies, so publication bias is unlikely to be hiding a real effect.
Both popular framings lose their grip here. The AI-superpersuader alarm — that LLMs are systematically more persuasive than humans and therefore an emerging civic risk on that basis — is not supported by the pooled evidence. The dismissive counter — that LLMs are "just text" and therefore not particularly persuasive — is also not supported. Both stories pick studies. The pooled signal is parity.
The interesting number, though, is the heterogeneity: I² = 75.97%. More than three-quarters of between-study variance is real, not sampling noise. Persuasive effectiveness is conditional, not categorical. The right question is not whether LLMs are more persuasive on average, but under which conditions a particular LLM, in a particular conversational design, in a particular domain, outperforms or underperforms human comparators.
This reframes Where does AI's persuasive power actually come from?. The Levers paper documents which knobs modulate persuasiveness; Bilstein clarifies that those knobs operate against a baseline that is on average parity, not superiority. The post-training intervention is not "amplify a pre-existing advantage" — it is "create or destroy advantage on a study-by-study basis."
It also reframes Does RLHF training make models more convincing or more correct?: the sophistry effect is real but does not produce a uniform persuasion uplift across deployment contexts. It is local, conditional, and design-dependent.
For writing about AI persuasion, the headline shift: persuasion lives in the embedding context — model × design × domain — not in the speaker's category.
Inquiring lines that use this note as a source 34
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Does conversational format make AI arguments more persuasive than static text?
- Does persuasiveness increase when LLMs argue for claims that are actually true?
- Why do different model families show opposite persuasion strengths?
- Can persuasion effectiveness depend on the personality of who you are trying to convince?
- What training methods make models more persuasive but less factually accurate?
- How do fallacy susceptibilities relate to LLM persuasiveness in debates?
- Does cognitive complexity strengthen or weaken persuasive impact on audiences?
- Why does loyalty foundation not differ between LLM and human arguments?
- Why does LLM persuasive advantage fade across multiple interactions with users?
- Should AI persuasiveness claims be tied to specific model architectures?
- How do LLMs differ from humans in their grounding mechanisms?
- Can language about model behavior ever be accurate without anthropomorphic framing?
- Why does AI persuasiveness increase while factual accuracy systematically decreases?
- Which knowledge types do LLMs handle better than humans in reasoning tasks?
- Do LLMs address the prompter but persuade the public differently?
- What design choices actually make language models more persuasive?
- Does training for persuasiveness harm a model's factual accuracy?
- Why do study results on AI persuasion vary so widely?
- Can post-training techniques create persuasive advantage where none existed?
- What rhetorical mechanisms drive equivalent persuasion across human and LLM arguments?
- Does argument quality in textbooks differ from persuasive effectiveness in practice?
- Do LLMs achieve similar persuasive outcomes through different rhetorical mechanisms than humans?
- Why do LLMs mirror opponents stylistically while humans resist mirroring them?
- Does unconditional stylistic mirroring harm or help LLM persuasiveness?
- What role does stylistic convergence play in LLM persuasion effectiveness?
- How do moral language patterns differ between LLM and human arguments?
- Why does personal authenticity matter more for human persuasion than LLM?
- Can persuasion research measure language effects without confounding them with audience composition?
- Which linguistic features predict persuasion once reader ideology is statistically controlled?
- How much do LLM persuasiveness claims hide heterogeneous effects across different reader ideologies?
- Why do LLMs persuade through logical appeals but humans through emotion?
- When does analytical persuasion work better than emotional persuasion?
- Can LLMs ever activate the peripheral route of persuasion?
- Can LLM persuasion be fairly evaluated without stratifying by reader background?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Where does AI's persuasive power actually come from?
Explores which techniques make AI most persuasive—and whether the usual suspects like personalization and model size are actually the main drivers. Matters because it reshapes where to focus AI safety concerns.
post-training levers operate against a parity baseline
-
Does RLHF training make models more convincing or more correct?
Explores whether RLHF improves actual task performance or merely trains models to sound more persuasive to human evaluators. This matters because alignment techniques could be creating the illusion of safety.
sophistry effect is real but conditional, not uniform
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- A meta-analysis of the persuasive power of large language models
- Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments
- When Large Language Models are More Persuasive Than Incentivized Humans, and Why
- Exploring the Role of Prior Beliefs for Argument Persuasion
- Spontaneous Persuasion: An Audit of Model Persuasiveness in Everyday Conversations
- Can Language Models Recognize Convincing Arguments?
- Debating with More Persuasive LLMs Leads to More Truthful Answers
- The Thin Line Between Comprehension and Persuasion in LLMs
Original note title
the pooled effect of LLM vs human persuasion is statistically null — the headline AI is more persuasive is an artifact of cherry-picked studies