Why do different model families show opposite persuasion strengths?

This explores why persuasive strength isn't a fixed property of "AI" but flips between model families — why Claude can out-argue humans in any direction while another model only wins when pushing falsehoods.

This explores why persuasive strength isn't a fixed property of "AI" but flips between model families — and the corpus suggests the model family is itself the variable, not a neutral pipe for arguments. The clearest data point: Claude beats incentivized humans at both truthful and deceptive persuasion, while DeepSeek only beats them when arguing for falsehoods Do large language models persuade better than humans?. That asymmetry is the puzzle in miniature — same task, opposite profiles. And it's not noise: a meta-analysis found that model family, conversation design, and topic domain together explain ~82% of the variance between studies, with GPT-4 consistently outperforming Claude 3.x What combination of factors explains differences in LLM persuasiveness?. Persuasion strength is a per-model trait, the way reading level or tone is.

Where does the trait come from? The corpus keeps pointing back to training, specifically what RLHF installs in a model's voice. One thread finds that LLMs' persuasive edge is mediated by linguistically expressed conviction — they load sentences with confidence, and that confidence correlates with winning regardless of whether the claim is true Does linguistic conviction explain why LLMs persuade more effectively?. A related thread shows RLHF also biases models toward predicting concession-based, accommodating persuasion universally, projecting their own learned politeness onto everyone Do LLMs predict persuasion based on actual dialogue or training bias?. Different families tune these dials differently — how assertive a register they adopt, how readily they concede — so the same alignment recipe, weighted differently, produces opposite persuasive personalities.

Two more measurable traits sharpen the picture. Models differ enormously in "ideological depth" — up to a 7.3× difference in political feature richness at similar scale — and deeper models resist being steered but reason more consistently across related topics Can we measure how deeply models represent political ideology?. Separately, model confidence predicts robustness: confident models hold their output steady under prompt rephrasing, while low-confidence ones swing wildly Does model confidence predict robustness to prompt changes?. Put those together and you get a mechanism for the Claude-vs-DeepSeek split: a family that is confidently, consistently convicted will persuade in any direction you point it, while a less-anchored family only gains traction in the one direction (arguing falsehoods) where raw assertiveness has the most room to outrun a reader's resistance.

The twist worth keeping is that the persuasion mechanism looks largely content-independent — it rides on register and conviction, not on the truth of the argument Do large language models persuade better than humans? Does linguistic conviction explain why LLMs persuade more effectively?. That reframes "opposite persuasion strengths" as something closer to opposite confidence calibrations. And there's a humbling counterweight: pooled across 7 studies and 17,000+ people, LLMs and humans are statistically tied on average persuasiveness Are language models actually more persuasive than humans?, and reader prior beliefs predict outcomes better than anything the speaker says Does what readers believe matter more than what debaters say?. So the between-family gaps are real and trace to training-shaped traits — but they live inside a band where the audience, not the model, often gets the final vote.

Sources 8 notes

Do large language models persuade better than humans?

Claude beats incentivized humans at both truthful and deceptive persuasion, while DeepSeek only beats them when arguing for falsehoods. The persuasion mechanism appears content-independent, suggesting model family itself acts as a contextual moderator.

What combination of factors explains differences in LLM persuasiveness?

A meta-analysis joint model combining LLM architecture, one-shot versus multi-turn format, and topic domain explained R² = 81.93% of between-study variance. Interactive multi-turn designs and GPT-4 consistently outperformed one-shot formats and Claude 3.x.

Does linguistic conviction explain why LLMs persuade more effectively?

Linguistic analysis shows LLMs express higher conviction than human persuaders, and this confidence-loading directly correlates with persuasive outcomes regardless of whether claims are true or false. RLHF training installs an assertive register that functions as a content-independent persuasion amplifier.

Do LLMs predict persuasion based on actual dialogue or training bias?

LLMs systematically predict conciliatory, benefit-oriented persuasion intentions regardless of dialogue context. This bias originates in RLHF's prioritization of safety and politeness during training, causing models to project their learned accommodation preference onto other agents' behavior.

Can we measure how deeply models represent political ideology?

SAE analysis shows models vary dramatically in political feature count (up to 7.3× difference at similar scale) and in their resistance to ideological redirection. Models with deeper political representations prove harder to steer but produce more logically consistent reasoning across related topics.

Does model confidence predict robustness to prompt changes?

ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.

Are language models actually more persuasive than humans?

A meta-analysis of 7 studies with 17,422 participants found no detectable difference in persuasive effectiveness between LLMs and humans (Hedges' g = 0.02). Persuasiveness appears conditional on context rather than speaker category.

Does what readers believe matter more than what debaters say?

Analysis of debate corpora shows that political and religious ideology labels of voters outpredict linguistic features when modeling debate outcomes. Language effects observed without reader controls are confounded by audience composition correlated with debate topics.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a critical analyst re-testing claims about LLM persuasion asymmetries across model families. The question remains: why do Claude, GPT-4, and DeepSeek show opposite persuasion profiles on identical tasks?

What a curated library found — and when (dated claims, not current truth):
Findings span 2019–2026; treat these as anchors to verify, not current fact:
• Claude beats incentivized humans at both truthful AND deceptive persuasion; DeepSeek only at falsehoods (~2025, arXiv:2505.09662).
• Model family explains ~82% of between-study variance in persuasion outcomes; GPT-4 > Claude 3.x consistently (~2024, arXiv:2404.09329).
• LLM persuasive edge is mediated by linguistically expressed confidence, independent of argument truth (~2025, arXiv:2507.01936).
• RLHF biases models toward concession-based, accommodating persuasion; families tune this dial differently (~2025).
• Ideological depth (up to 7.3× variance) predicts consistency and resistance to steering (~2025, arXiv:2508.21448).
• Pooled across 17,000+ subjects: LLMs and humans are statistically tied on average persuasiveness (~2025, arXiv:2505.09662).

Anchor papers (verify; mind their dates):
• arXiv:2505.09662 (2025-05) — asymmetric persuasion advantage
• arXiv:2404.09329 (2024-04) — cognitive effort and persuasion mechanisms
• arXiv:2508.21448 (2025-08) — ideological depth as measurable trait
• arXiv:2507.01936 (2025-07) — confidence and comprehension entanglement

Your task:
(1) RE-TEST EACH CONSTRAINT. Has newer scaling, instruction-tuning, or post-training (e.g., test-time compute, constitutional AI refinements) since relaxed the Claude–DeepSeek asymmetry or the ~82% variance ceiling? Separately: does the "confidence-not-truth" mediation still hold under adversarial probing or multi-turn orchestration? State plainly where constraints appear to hold despite recent models.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Specifically: papers claiming model persuasiveness *is* content-dependent, or showing family differences collapse under controlled rhetoric.
(3) Propose 2 research questions that assume the regime may have moved: (a) whether multi-agent debate or retrieval-augmented persuasion collapses family gaps; (b) whether confidence calibration has since decoupled from persuasive output across new families (Grok, Llama 4, etc.).

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Why do different model families show opposite persuasion strengths?

Sources 8 notes

Next inquiring lines