SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Language, Text, and Discourse Model Architecture and Internals

Does fine-tuning on NLI teach inference or amplify shortcuts?

When LLMs are fine-tuned on natural language inference datasets, do they learn genuine reasoning abilities or become better at exploiting statistical patterns in the training data? Understanding this distinction matters for assessing model capabilities.

Synthesis note · 2026-02-21 · sourced from Natural Language Inference
What kind of thing is an LLM really? Where exactly do LLMs break down with language structure? How should researchers navigate LLM reasoning research?

"LLMs are Frequency Pattern Learners in NLI" identifies a consistent frequency bias in NLI datasets: predicates in hypotheses are more frequent in training data than predicates in premises, for positive (entailment) instances. LLMs exploit this pattern. The disturbing finding: fine-tuning on NLI corpora increases reliance on frequency bias rather than decreasing it.

The mechanism connects to a real property of language. Hypernyms (more general terms: "animal") are more frequent than hyponyms (more specific terms: "dog") in natural text. Since upward entailment works from specific to general (SPRINT entails RUN), frequency can be a useful proxy for entailment direction. Fine-tuning teaches models to exploit this proxy more aggressively.

The problem: frequency is a statistical artifact, not a semantic relationship. It works often enough to appear as learning on standard benchmarks but fails on adversarial cases where the frequency pattern disagrees with the actual entailment label. After fine-tuning, LLMs perform significantly worse on adversarial instances than base models — they have learned the shortcut more deeply.

This is a general pattern in the vault: Can models pass tests while missing the actual grammar? shows that surface heuristics enable correct behavior on easy cases while degrading robustness on unusual ones. Fine-tuning amplifies this problem by rewarding the heuristic through the training signal. The model that appears to "learn inference" has learned to use training data statistics more efficiently.

What distinguishes this from the attestation bias (memorization of specific sentences): frequency bias operates at the corpus level — it is a statistical regularity learned from the distribution of natural text, not from specific memorized statements. Both are shortcuts that substitute for inference, but they originate from different levels of the training data.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 143 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

fine-tuning on nli amplifies llm frequency bias rather than teaching genuine inference