Can models learn argument quality from labeled examples alone?

Explores whether fine-tuning on quality-labeled examples teaches models the underlying criteria for evaluating arguments, or merely surface patterns. Matters because high-stakes assessment tasks depend on reliable, transferable quality judgment.

Synthesis note · 2026-02-21 · sourced from Argumentation

Argument Quality Assessment research trains models to evaluate the quality of arguments — are they logically valid? Well-supported? Relevant? Clear? The standard approach is supervised fine-tuning: label examples as high/low quality, train on them, evaluate transfer.

The finding: fine-tuning on quality-labeled examples does not reliably teach the models what makes arguments good. Models learn to pattern-match against the labeled examples but do not acquire the underlying criteria that would generalize to new argument types. When explicit theoretical frameworks (RATIO: Relevance, Acceptability, Sufficiency; QOAM: Quality of Argumentation Model) are provided as structured instruction, performance improves significantly.

Theory injection works where pattern learning fails.

This is a specific instance of Can models pass tests while missing the actual grammar?: models that score highly on quality assessments in the training distribution fail to transfer the criteria to out-of-distribution argument types. The learned pattern is "this looks like high-quality arguments in the training data" rather than "this argument satisfies the following criteria for quality."

The implication extends beyond argumentation. Whenever an evaluation task requires applying principled criteria that are not explicit in the labeled data — quality, fairness, coherence, persuasiveness — fine-tuning on examples risks teaching the distribution rather than the criteria. Why do different people reconstruct the same argument differently? points at the same problem from the other direction: if there's no gold standard, labeled examples cannot straightforwardly encode the right criteria.

The practical consequence: assessment tasks in high-stakes domains (argument quality in legal reasoning, argument validity in policy analysis) should not rely on fine-tuned models trained only on labeled examples. Explicit criteria instruction — prompting with theoretical frameworks, structured evaluation rubrics — is required.

Inquiring lines that use this note as a source 55

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

17 direct connections · 195 in 2-hop network ·dense cluster Open in graph ↗

Can models learn argument quality from labeled e… Can models pass tests while missing the actual gra… Why do different people reconstruct the same argum… Can structured argument prompts make LLM reasoning… What makes explanations work in real conversation?

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can models pass tests while missing the actual grammar? Do language models succeed on grammatical benchmarks by learning surface patterns rather than structural rules? This matters because correct outputs may hide reliance on shallow heuristics that fail on novel structures.
same pattern: training distribution ≠ underlying criteria
Why do different people reconstruct the same argument differently? When humans and LLMs extract logical structure from arguments, they produce different reconstructions. Is this disagreement a problem to solve, or does it reveal something fundamental about how arguments work?
no gold standard means labeled examples may encode arbitrary choices
Can structured argument prompts make LLM reasoning more rigorous? Does requiring language models to explicitly check warrants, backing, and rebuttals—rather than reasoning freely—improve reasoning quality and catch failures that standard step-by-step prompting misses?
explicit theory injection (CQoT) works for the same reason: making implicit criteria explicit
What makes explanations work in real conversation? Does explanation quality depend on how dialogue partners interact—testing understanding, adjusting based on feedback, and coordinating their communicative moves—rather than just information content alone?
parallel decomposition: argument quality requires framework instruction (RATIO, QOAM) and explanation quality requires tracking three interacting dimensions; both reject unitary quality measures in favor of multi-dimensional criteria that models cannot learn from examples alone

Can models learn argument quality from labeled examples alone?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4