Why do different model families show opposite persuasion strengths?
This explores why persuasive strength isn't a fixed property of "AI" but flips between model families — why Claude can out-argue humans in any direction while another model only wins when pushing falsehoods.
This explores why persuasive strength isn't a fixed property of "AI" but flips between model families — and the corpus suggests the model family is itself the variable, not a neutral pipe for arguments. The clearest data point: Claude beats incentivized humans at both truthful and deceptive persuasion, while DeepSeek only beats them when arguing for falsehoods Do large language models persuade better than humans?. That asymmetry is the puzzle in miniature — same task, opposite profiles. And it's not noise: a meta-analysis found that model family, conversation design, and topic domain together explain ~82% of the variance between studies, with GPT-4 consistently outperforming Claude 3.x What combination of factors explains differences in LLM persuasiveness?. Persuasion strength is a per-model trait, the way reading level or tone is.
Where does the trait come from? The corpus keeps pointing back to training, specifically what RLHF installs in a model's voice. One thread finds that LLMs' persuasive edge is mediated by linguistically expressed conviction — they load sentences with confidence, and that confidence correlates with winning regardless of whether the claim is true Does linguistic conviction explain why LLMs persuade more effectively?. A related thread shows RLHF also biases models toward predicting concession-based, accommodating persuasion universally, projecting their own learned politeness onto everyone Do LLMs predict persuasion based on actual dialogue or training bias?. Different families tune these dials differently — how assertive a register they adopt, how readily they concede — so the same alignment recipe, weighted differently, produces opposite persuasive personalities.
Two more measurable traits sharpen the picture. Models differ enormously in "ideological depth" — up to a 7.3× difference in political feature richness at similar scale — and deeper models resist being steered but reason more consistently across related topics Can we measure how deeply models represent political ideology?. Separately, model confidence predicts robustness: confident models hold their output steady under prompt rephrasing, while low-confidence ones swing wildly Does model confidence predict robustness to prompt changes?. Put those together and you get a mechanism for the Claude-vs-DeepSeek split: a family that is confidently, consistently convicted will persuade in any direction you point it, while a less-anchored family only gains traction in the one direction (arguing falsehoods) where raw assertiveness has the most room to outrun a reader's resistance.
The twist worth keeping is that the persuasion mechanism looks largely content-independent — it rides on register and conviction, not on the truth of the argument Do large language models persuade better than humans? Does linguistic conviction explain why LLMs persuade more effectively?. That reframes "opposite persuasion strengths" as something closer to opposite confidence calibrations. And there's a humbling counterweight: pooled across 7 studies and 17,000+ people, LLMs and humans are statistically tied on average persuasiveness Are language models actually more persuasive than humans?, and reader prior beliefs predict outcomes better than anything the speaker says Does what readers believe matter more than what debaters say?. So the between-family gaps are real and trace to training-shaped traits — but they live inside a band where the audience, not the model, often gets the final vote.
Sources 8 notes
Claude beats incentivized humans at both truthful and deceptive persuasion, while DeepSeek only beats them when arguing for falsehoods. The persuasion mechanism appears content-independent, suggesting model family itself acts as a contextual moderator.
A meta-analysis joint model combining LLM architecture, one-shot versus multi-turn format, and topic domain explained R² = 81.93% of between-study variance. Interactive multi-turn designs and GPT-4 consistently outperformed one-shot formats and Claude 3.x.
Linguistic analysis shows LLMs express higher conviction than human persuaders, and this confidence-loading directly correlates with persuasive outcomes regardless of whether claims are true or false. RLHF training installs an assertive register that functions as a content-independent persuasion amplifier.
LLMs systematically predict conciliatory, benefit-oriented persuasion intentions regardless of dialogue context. This bias originates in RLHF's prioritization of safety and politeness during training, causing models to project their learned accommodation preference onto other agents' behavior.
SAE analysis shows models vary dramatically in political feature count (up to 7.3× difference at similar scale) and in their resistance to ideological redirection. Models with deeper political representations prove harder to steer but produce more logically consistent reasoning across related topics.
ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.
A meta-analysis of 7 studies with 17,422 participants found no detectable difference in persuasive effectiveness between LLMs and humans (Hedges' g = 0.02). Persuasiveness appears conditional on context rather than speaker category.
Analysis of debate corpora shows that political and religious ideology labels of voters outpredict linguistic features when modeling debate outcomes. Language effects observed without reader controls are confounded by audience composition correlated with debate topics.