Can large language models classify argument schemes reliably?

Inquiring lines that use this note as a source 29

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Can AI arguments participate in discourse without temporal grounding?
Can beam search and ranking functions evaluate claims without understanding counterarguments?
Can language models reason without relying on learned semantic patterns?
Can smaller open-source LLMs reliably detect agreement across unfamiliar topics?
How do you measure the depth of political representation inside a language model?
Why can LLMs identify argument structure but not check warrants?
Can encoder models match human conceptual structure better than larger language models?
How susceptible are language models to rhetorical pressure during debates?
Can language models reason without relying on surface level pattern matching?
Can LLMs distinguish stylistic patterns that carry meaning from mere convention?
Can lightweight linguistic features reliably detect LLM generated arguments?
Why do smaller LLMs fail at zero-shot argument scheme classification?
Why does scheme classification require more cognitive load than identifying premises?
Does compressing Walton's schemes into nine categories make LLM classification easier?
Can LLM-generated descriptions of schemes outperform formal dictionary definitions for prompting?
What are the three orthogonal axes that structure the argument scheme periodic table?
How does the first-order and second-order distinction unify classical and modern argument theory?
Why do LLM descriptions of argument schemes work better than formal definitions for classification?
Can smaller scheme inventories or critical questions replace direct scheme classification?
How do pretrained language models represent inferential patterns versus lexical and positional cues?
What failure modes emerge when scheme classification feeds downstream reasoning pipelines?
Can unfilled cells in the periodic table represent undiscovered argument schemes?
Can argumentation structure improve reasoning through decomposition alone?
Does argument-scheme prompting improve reasoning in non-code domains the same way?
Can formal argumentation structure replace ad-hoc fallacy classifications?
Do computational systems need formal argument analysis for explainability?
Can language models beat human experts in domains with sparse historical signals?
Why do more capable language models benefit more from diversity elicitation?
What makes domain-specific utterance resolution harder for general large models?

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 103 in 2-hop network ·medium cluster Open in graph ↗

Can large language models classify argument sche… Can structured argument prompts make LLM reasoning… Why do paraphrased definitions work better than ex… Why does argument scheme classification stumble wh… Can formal argumentation make AI decisions truly c… Can three axes organize all possible argument sche…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can structured argument prompts make LLM reasoning more rigorous? Does requiring language models to explicitly check warrants, backing, and rebuttals—rather than reasoning freely—improve reasoning quality and catch failures that standard step-by-step prompting misses?
the complementary use: scheme structure as input to reasoning rather than as output label
Why do paraphrased definitions work better than expert ones? When instructing LLMs to classify argument schemes, should we use formal Walton definitions or LLM-generated paraphrases? This explores which source better enables reliable scheme recognition and why.
same paper, the operationalization-beats-definition finding
Why does argument scheme classification stumble where other NLP tasks succeed? Explores whether the abstract, relational nature of argument schemes makes them harder to classify than concrete argument components or stance. Matters because understanding this difficulty gap could improve scheme recognition systems.
same paper, the cognitive-load mechanism
Can formal argumentation make AI decisions truly contestable? Explores whether structuring AI decisions as formal argument graphs (with explicit attacks and defenses) enables users to meaningfully challenge and navigate reasoning in ways unstructured LLM outputs cannot.
the upstream motivation for getting scheme classification right
Can three axes organize all possible argument schemes? Can a small set of orthogonal distinctions—subject vs. predicate, order level, and proposition types—capture the full space of valid argument structures? This matters because it could replace ad-hoc scheme lists with a systematic framework.
productive tension: Wagemans's periodic table compresses the 60+ Walton schemes to 9 combinatorial cells; whether the abstraction makes LLM classification easier (fewer targets) or harder (more abstract categories) is open — see [[periodic-table-compresses-arguments-to-nine-cells-but-llms-already-struggle-with-walton-s-sixty-scheme-classification]]

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Can Large Language Models Understand Argument Schemes?0.92 match · arxiv ↗
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs0.85 match · arxiv ↗
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs0.85 match · arxiv ↗
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models0.84 match · arxiv ↗
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds0.84 match · arxiv ↗
Self-Evaluation Guided Beam Search for Reasoning0.84 match · arxiv ↗
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models0.84 match · arxiv ↗
Large Language Model Reasoning Failures0.84 match · arxiv ↗

Search by related questions 4

Suggested questions this note speaks to — click to search the collection, or type your own.