Do LLM arguments actually argue better than humans?
LLM counter-arguments score higher on textbook quality markers like logical soundness and respectful tone, while human arguments show more creativity and emotional intensity. What does this gap reveal about how we measure argumentative quality?
LLM-generated counter-arguments score higher than human counter-arguments on the markers a rhetoric textbook would teach: they are more cogent, more explicitly justified, more respectful toward the interlocutor, and more positive in emotional tone. Humans, in contrast, score higher on three orthogonal features: greater lexical and syntactic creativity, more negative emotion, and stronger use of interactive discourse markers (turn-taking signals, addressivity, conversational repair).
The pattern is more specific than "LLMs argue better." It says LLMs argue the way an instructor wants students to argue, while humans argue the way actual people in actual disputes argue. The textbook-quality profile is a recognizable artifact of training: RLHF-style objectives reward politeness, justification, and emotional restraint; they penalize the very features that make human argumentation distinctive — disagreement intensity, creative phrasing, and the conversational micro-moves that signal a real exchange between people.
The implication for detection is uncomfortable. The features that separate LLMs from humans are precisely the features prescribed argument quality: by being good students of argumentation, LLMs become identifiable. This creates a perverse incentive in the other direction: if detection were a serious cost, the cheapest evasion would be to add lexical noise, negative emotion, and conversational disfluency — that is, to make outputs worse by textbook standards in order to look more human. The textbook–human gap is the detection surface.
The deeper finding is that argument quality and argumentative authenticity are different things. A model trained to produce good arguments will reliably fail to produce human arguments. The two targets diverge.
Inquiring lines that use this note as a source 8
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do LLMs use more moral language than humans in argumentation?
- Do LLMs match top human creative writers in literary quality?
- Why does loyalty foundation not differ between LLM and human arguments?
- Why do LLM judges assign high argument strength scores yet pick LLM winners anyway?
- Does argument quality in textbooks differ from persuasive effectiveness in practice?
- What linguistic features most strongly signal LLM authorship in counter-arguments?
- Can forensic features reliably distinguish LLM arguments from human arguments?
- How do moral language patterns differ between LLM and human arguments?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do LLM counter-arguments mirror writing style more than humans?
When language models generate arguments against social media posts, do they unconsciously adopt the stylistic features of what they're arguing against? This matters because it could reveal a detectable pattern that distinguishes LLM-written rebuttals from human-written ones.
the second axis of the production-mechanism gap: humans diverge stylistically while LLMs mirror
-
Do LLMs and humans persuade through the same mechanisms?
If LLM and human arguments achieve equal persuasive force, does that mean they work the same way? This explores whether equivalent outcomes hide fundamentally different rhetorical strategies.
generalizes this to a "different ingredients, equivalent outcomes" pattern
-
Can simple linguistic features detect AI-written arguments?
Can interpretable linguistic patterns reliably distinguish LLM-generated counter-arguments from human-written ones in persuasive contexts? This matters because simple, auditable detection might outperform expensive neural approaches.
the detectability is built on this textbook–human gap
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments
- AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive Contexts
- Argument Quality Assessment in the Age of Instruction-Following Large Language Models
- The Thin Line Between Comprehension and Persuasion in LLMs
- Debating with More Persuasive LLMs Leads to More Truthful Answers
- Argumentative Large Language Models for Explainable and Contestable Decision-Making
- Argunauts: Open LLMs that Master Argument Analysis with Argdown
- Can Language Models Recognize Convincing Arguments?
Original note title
LLM arguments resemble textbook-quality more than human arguments — cogent justified positive while humans bring negative emotion creativity and interactive discourse