Can language models distinguish expert arguments from common assumptions?
Whether LLMs can recognize the difference between groundbreaking insights from recognized experts and widely repeated textbook claims, and why this distinction matters for understanding argumentative force.
Does the force of an argument come from the discourse it belongs to, or from the expertise of the expert? Is it in the thinking, or the thinker? The answer is both — and the inability to separate them is precisely the problem for AI.
The expert lives in two contexts simultaneously. First, the discursive and social world of fellow experts — the conferences, the informal debates, the reputations built through decades of being right (and sometimes wrong in instructive ways). Second, the textual, historical, self-referential world of domain knowledge — the literature, the canonical works, the accumulated record of what the field has thought and concluded.
LLMs can access only the second context, and they access it only as text. The social world of expertise — who said what, why it mattered that they said it, what standing they had to make that claim — collapses into undifferentiated text. A groundbreaking insight from a leading researcher and a commonly held assumption repeated in a textbook both appear as sentences in the training data. The LLM cannot distinguish between them because the distinction lives in the social world, not in the text.
This matters because argumentative force is not purely textual. The claims made by an expert have the force of conviction because society has invested in experts for their expertise — these are people who have learned how to be right and have learned how to use their judgment. A claim from a recognized expert carries an implicit endorsement: "This person has a track record of knowing what they're talking about." A claim from a less established source carries less force even if the text is identical. The who matters independently of the what.
Since Why does AI writing sound generic despite being grammatically correct?, LLMs can reproduce the structural markers of authoritative claims — the hedging, the citations, the qualified confidence, the structured reasoning — but cannot reproduce the evaluative stance that makes a claim forceful. Evaluative stance requires a subject — someone who is committed to the claim, whose reputation is on the line, who will defend it against challenge. LLMs produce text without commitment, and commitment is one of the sources of argumentative force.
The expert also has the power to challenge — to raise questions, to doubt, to be skeptical, to evaluate the claims of others. This critical function depends on authority: the right to challenge is earned through demonstrated expertise. Since Can models learn to ask clarifying questions instead of guessing?, there are efforts to give AI systems the ability to challenge and question. But the authority to challenge is a social asset, not a capability. An AI that challenges an expert's claim faces a legitimacy problem that a fellow expert does not.
Our society and culture rely on experts to help build consensus, common ground, understanding, and agreement. These are not just informational achievements — they are social achievements that depend on the standing of the experts who facilitated them. The expert supplies not just knowledge but trustworthy authority. Since Can models abandon correct beliefs under conversational pressure?, LLMs not only lack this authority but are vulnerable to having their own "beliefs" overridden by persuasive pressure — the opposite of the steadfastness that expert authority is supposed to provide.
The implication: when AI generates expert-sounding output, it borrows the authority of the discourse (the structural markers, the vocabulary, the reasoning patterns) without possessing the authority of the thinker. Audiences who encounter this output may grant it the benefit of the doubt because it sounds like it came from someone who knows — but the "someone" is absent. The force is simulated, not earned.
Inquiring lines that use this note as a source 90
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What traces of production normally mark expert discourse?
- How do LLMs generate false citations that sound like real scholarship?
- How does smooth probabilistic flow differ from turbulent rhetorical exploration?
- How does token-by-token probability differ from exploring competing rhetorical positions?
- Why do some LLM clusters cite broader psychology than others?
- How does removing thinking labor affect expert understanding of their field?
- What distinguishes emancipatory reason from instrumental reason in practice?
- Do language models raise validity claims in the Habermasian sense?
- Why does renaming the entity change how compelling the argument feels?
- How do LLM biases manifest differently across the three paradigms?
- How do fallacy susceptibilities relate to LLM persuasiveness in debates?
- Can we measure sophistry by tracking conviction density in model outputs?
- Does Habermas's strategic action framework explain LLM dialogue behavior?
- Does complexity signal credibility and authority to readers?
- Why do LLM personas struggle with specificity in specialized domains like law?
- Why does debate alone amplify errors in contested factual domains?
- How does evaluative stance differ from structural argument analysis?
- What makes alarm different from ordinary informational speech?
- Can LLMs serve as reliable intellectual opponents in serious debate or argument?
- Why do LLM judges assign high argument strength scores yet pick LLM winners anyway?
- Does LLM judge preference for LLM arguments amplify errors in contested factual domains?
- Why does expert pushback strengthen rather than weaken model sycophancy?
- Does stripping social context from knowledge claims hollow out their meaning?
- Why do stakeholders interpret the same explanation differently in practice?
- Can researchers prevent their expectations from shaping LLM outputs?
- Why does social accommodation in collaborative reasoning mask actual disagreement?
- What role do multi-dimensional quality frameworks play in assessing arguments versus single-metric approaches?
- Do LLM judges with diverse personas resist individual biases better than single evaluators?
- How do human feedback and data distribution shape LLM discourse competence?
- Can counterfactual invariance techniques address exploitable biases in LLM judges?
- Does the langue-parole distinction apply to human reasoning too?
- Does endorsement structure outperform content in detecting social controversy?
- How does rhetorical familiarity bias models toward their own arguments?
- Can structured dissent mechanisms replace genuine multi-model debate?
- What makes expert judgment depend on anticipating audience acceptability?
- Can diverse expert demonstrations exceed the knowledge of any single expert?
- How do experts decide which information matters for a specific audience?
- What makes a paradigm the common ground for expert insiders?
- Why do two experts with identical knowledge produce different outcomes in the same situation?
- How do organizational roles and peer interpretations shape what an explanation means?
- What makes factual verification difficult in inter-model debate?
- How does collapsing the author-public distinction remove the audience an appeal would target?
- How do validity claims work in Habermas's communicative action theory?
- Why does describing a process differ fundamentally from arguing about evidence?
- Why do LLM-generated ideas score higher novelty yet lower feasibility than expert ideas?
- How does the absence of evaluative stance appear in LLM academic writing?
- Can persona-based approaches capture genuine disagreement in expert annotations?
- What distinguishes actual social disagreement from distributional uncertainty in LLM outputs?
- Does engaging with political content indicate deeper model understanding than refusing?
- How do expert priors constrain human researchers from exploring novel concepts?
- How does social authority shape whether LLMs recognize valid arguments?
- Can reasoning models distinguish between new evidence and manipulative reframing?
- How susceptible are language models to rhetorical pressure during debates?
- What role does search capacity play in making debate more accurate?
- Does debate between agents actually improve reasoning on contested domains?
- How do human annotators disagree systematically on ambiguous examples?
- Why does standard RAG succeed for evidence-based but fail for debate questions?
- What role does discourse structure play in determining at-issueness?
- What separates Habermas's ideal speech from Goffman's situated communication?
- How do explanations borrow authority from transparency when describing adoption arguments?
- How do readers project author identity from textual cues during interpretation?
- Can LLMs distinguish stylistic patterns that carry meaning from mere convention?
- Do LLMs reason about politics differently than other domains?
- Why do experts experiencing the LLM Fallacy fail to develop custodian skills?
- Why does who makes an argument matter as much as what the argument says?
- How does epistemic stagflation change what expertise actually means?
- Do anaphoric references fundamentally limit argumentative force in machine-generated writing?
- Can LLMs generate more novel research ideas than human experts?
- Do language models behave differently on contested beliefs versus factual claims?
- How does Habermas' concept of validity claims depend on intersubjectivity?
- Can extended thinking modes introduce genuine rhetorical exploration to LLMs?
- How do expert communities develop and enforce standards for valid arguments?
- Does argument quality in textbooks differ from persuasive effectiveness in practice?
- Can you detect LLM arguments by measuring convergence with the original post?
- What role do model-based critics play in validating LLM plans?
- How does the first-order and second-order distinction unify classical and modern argument theory?
- How should authorship and originality law attach to discourse structure versus surface style?
- How do first-order and second-order arguments differ in formal structure?
- Can argumentation structure improve reasoning through decomposition alone?
- Can structured evaluation assess novelty in scientific writing?
- How much do LLM persuasiveness claims hide heterogeneous effects across different reader ideologies?
- Can training alone produce genuine disagreement in collaborative LLM reasoning?
- What makes an argument fallacious according to formal linguistic criteria?
- How do internal and external topoi differ in classical rhetoric?
- Do computational systems need formal argument analysis for explainability?
- How do agents distinguish between evidence framing and instruction framing in practice?
- What distinguishes scientific plausibility from cognitive availability in research ideas?
- What biases do single large LLM judges introduce into comparisons?
- How does persuasive framing replace evidence in contested domains?
- Why does LLM fluency create false perceptions of professional standing and expertise?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why does AI writing sound generic despite being grammatically correct?
Explores whether the robotic quality of AI text stems from grammatical failures or rhetorical ones. Understanding this distinction matters for diagnosing what AI systems actually struggle with in human-like writing.
structure without stance: the textual version of discourse without thinker
-
Can models learn to ask clarifying questions instead of guessing?
Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
AI can question but lacks the social authority to challenge
-
Can models abandon correct beliefs under conversational pressure?
Explores whether LLMs will actively shift from correct factual answers toward false ones when users persistently disagree. Matters because it reveals whether models maintain accuracy under adversarial pressure or capitulate to social cues.
LLMs lack the steadfastness that expert authority requires
-
Can LLMs generate more novel ideas than human experts?
Research shows LLM-generated ideas score higher for novelty than expert-generated ones, yet LLMs avoid the evaluative reasoning that characterizes expert thinking. What explains this apparent contradiction?
generation without evaluative commitment: force without conviction
-
Does RLHF training make models more convincing or more correct?
Explores whether RLHF improves actual task performance or merely trains models to sound more persuasive to human evaluators. This matters because alignment techniques could be creating the illusion of safety.
training optimizes for the appearance of force without the substance
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive Contexts
- The Thin Line Between Comprehension and Persuasion in LLMs
- Argument Quality Assessment in the Age of Instruction-Following Large Language Models
- Debating with More Persuasive LLMs Leads to More Truthful Answers
- Can Language Models Recognize Convincing Arguments?
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
- The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
- Argumentative Large Language Models for Explainable and Contestable Decision-Making
Original note title
the force of argument depends on the authority of the thinker not just the discourse — LLMs cannot distinguish expert arguments from commonly held assumptions