Can language models distinguish expert arguments from common assumptions?

Whether LLMs can recognize the difference between groundbreaking insights from recognized experts and widely repeated textbook claims, and why this distinction matters for understanding argumentative force.

Synthesis note · 2026-03-26

Does the force of an argument come from the discourse it belongs to, or from the expertise of the expert? Is it in the thinking, or the thinker? The answer is both — and the inability to separate them is precisely the problem for AI.

The expert lives in two contexts simultaneously. First, the discursive and social world of fellow experts — the conferences, the informal debates, the reputations built through decades of being right (and sometimes wrong in instructive ways). Second, the textual, historical, self-referential world of domain knowledge — the literature, the canonical works, the accumulated record of what the field has thought and concluded.

LLMs can access only the second context, and they access it only as text. The social world of expertise — who said what, why it mattered that they said it, what standing they had to make that claim — collapses into undifferentiated text. A groundbreaking insight from a leading researcher and a commonly held assumption repeated in a textbook both appear as sentences in the training data. The LLM cannot distinguish between them because the distinction lives in the social world, not in the text.

This matters because argumentative force is not purely textual. The claims made by an expert have the force of conviction because society has invested in experts for their expertise — these are people who have learned how to be right and have learned how to use their judgment. A claim from a recognized expert carries an implicit endorsement: "This person has a track record of knowing what they're talking about." A claim from a less established source carries less force even if the text is identical. The who matters independently of the what.

Since Why does AI writing sound generic despite being grammatically correct?, LLMs can reproduce the structural markers of authoritative claims — the hedging, the citations, the qualified confidence, the structured reasoning — but cannot reproduce the evaluative stance that makes a claim forceful. Evaluative stance requires a subject — someone who is committed to the claim, whose reputation is on the line, who will defend it against challenge. LLMs produce text without commitment, and commitment is one of the sources of argumentative force.

The expert also has the power to challenge — to raise questions, to doubt, to be skeptical, to evaluate the claims of others. This critical function depends on authority: the right to challenge is earned through demonstrated expertise. Since Can models learn to ask clarifying questions instead of guessing?, there are efforts to give AI systems the ability to challenge and question. But the authority to challenge is a social asset, not a capability. An AI that challenges an expert's claim faces a legitimacy problem that a fellow expert does not.

Our society and culture rely on experts to help build consensus, common ground, understanding, and agreement. These are not just informational achievements — they are social achievements that depend on the standing of the experts who facilitated them. The expert supplies not just knowledge but trustworthy authority. Since Can models abandon correct beliefs under conversational pressure?, LLMs not only lack this authority but are vulnerable to having their own "beliefs" overridden by persuasive pressure — the opposite of the steadfastness that expert authority is supposed to provide.

The implication: when AI generates expert-sounding output, it borrows the authority of the discourse (the structural markers, the vocabulary, the reasoning patterns) without possessing the authority of the thinker. Audiences who encounter this output may grant it the benefit of the doubt because it sounds like it came from someone who knows — but the "someone" is absent. The force is simulated, not earned.

Inquiring lines that use this note as a source 90

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

16 direct connections · 162 in 2-hop network ·dense cluster Open in graph ↗

Can language models distinguish expert arguments… Why does AI writing sound generic despite being gr… Can models learn to ask clarifying questions inste… Can models abandon correct beliefs under conversat… Can LLMs generate more novel ideas than human expe… Does RLHF training make models more convincing or …

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why does AI writing sound generic despite being grammatically correct? Explores whether the robotic quality of AI text stems from grammatical failures or rhetorical ones. Understanding this distinction matters for diagnosing what AI systems actually struggle with in human-like writing.
structure without stance: the textual version of discourse without thinker
Can models learn to ask clarifying questions instead of guessing? Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
AI can question but lacks the social authority to challenge
Can models abandon correct beliefs under conversational pressure? Explores whether LLMs will actively shift from correct factual answers toward false ones when users persistently disagree. Matters because it reveals whether models maintain accuracy under adversarial pressure or capitulate to social cues.
LLMs lack the steadfastness that expert authority requires
Can LLMs generate more novel ideas than human experts? Research shows LLM-generated ideas score higher for novelty than expert-generated ones, yet LLMs avoid the evaluative reasoning that characterizes expert thinking. What explains this apparent contradiction?
generation without evaluative commitment: force without conviction
Does RLHF training make models more convincing or more correct? Explores whether RLHF improves actual task performance or merely trains models to sound more persuasive to human evaluators. This matters because alignment techniques could be creating the illusion of safety.
training optimizes for the appearance of force without the substance

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

the force of argument depends on the authority of the thinker not just the discourse — LLMs cannot distinguish expert arguments from commonly held assumptions

Can language models distinguish expert arguments from common assumptions?

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 5