Can LLMs identify the hidden assumptions that make arguments work?
LLMs recognize what arguments claim and what evidence they offer, but struggle to identify implicit warrants—the unstated principles that connect evidence to conclusion. This matters because valid reasoning requires understanding these hidden logical bridges.
Toulmin's argument model distinguishes claim, data, and warrant. The claim is what is being argued. The data is the evidence. The warrant is the often-unstated principle connecting data to claim — the implicit assumption that makes the inference valid.
In natural language, warrants are almost never stated. When someone argues "This policy failed in Europe, so it will fail here," the unstated warrant is something like "contexts similar to Europe will produce similar outcomes." Evaluating the argument requires identifying this warrant and assessing its validity in context.
The Argument Reasoning Comprehension task tests this capability directly. LLMs perform well on identifying the explicit claim-data structure — recognizing what is being argued and what evidence is offered. They fail significantly at supplying or evaluating the implicit warrant. The gap between structural recognition and warrant identification is large.
This is a different failure than Why does ChatGPT fail at implicit discourse relations?. That finding concerns discourse relations (because, therefore, although). This finding concerns argumentative inference — the background knowledge required to evaluate whether data actually supports a claim. Both are implicit-structure failures, but at different levels.
The failure is not simply about world knowledge being absent. Do language models actually use their encoded knowledge? suggests relevant knowledge may be encoded but not accessed when needed. Warrant identification requires activating world knowledge in response to argumentative context — a different retrieval trigger than direct factual recall.
Practically: LLMs can generate the surface form of argumentation (claim, evidence, conclusion) without the inferential work that makes the argumentation valid. They can look like they are reasoning about arguments without engaging with the warrants that determine whether the arguments hold.
Inquiring lines that use this note as a source 4
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why does ChatGPT fail at implicit discourse relations?
ChatGPT excels when discourse connectives are present but drops to 24% accuracy without them. What does this gap reveal about how LLMs actually process meaning and logical relationships?
same implicit-structure failure at discourse level; this is the argumentative-inference level
-
Do language models actually use their encoded knowledge?
Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
knowledge encoded but not causally active in warrant retrieval
-
Can large language models translate natural language to logic faithfully?
This explores whether LLMs can convert natural language statements into formal logical representations without losing meaning. It matters because faithful translation is essential for any AI system that reasons formally or verifies specifications.
related failure: surface form of logic without semantic content
-
Can structured argument prompts make LLM reasoning more rigorous?
Does requiring language models to explicitly check warrants, backing, and rebuttals—rather than reasoning freely—improve reasoning quality and catch failures that standard step-by-step prompting misses?
the intervention that targets this gap
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants
- Explicit Inductive Inference using Large Language Models
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
- LLMs can implicitly learn from mistakes in-context
- Neutralizing Bias in LLM Reasoning using Entailment Graphs
- Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy
- Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying
- The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows
Original note title
implicit warrants in argumentation require world knowledge that llms cannot reliably supply even when surface argument structure is correctly identified