Does fine-tuning on NLI teach inference or amplify shortcuts?
When LLMs are fine-tuned on natural language inference datasets, do they learn genuine reasoning abilities or become better at exploiting statistical patterns in the training data? Understanding this distinction matters for assessing model capabilities.
"LLMs are Frequency Pattern Learners in NLI" identifies a consistent frequency bias in NLI datasets: predicates in hypotheses are more frequent in training data than predicates in premises, for positive (entailment) instances. LLMs exploit this pattern. The disturbing finding: fine-tuning on NLI corpora increases reliance on frequency bias rather than decreasing it.
The mechanism connects to a real property of language. Hypernyms (more general terms: "animal") are more frequent than hyponyms (more specific terms: "dog") in natural text. Since upward entailment works from specific to general (SPRINT entails RUN), frequency can be a useful proxy for entailment direction. Fine-tuning teaches models to exploit this proxy more aggressively.
The problem: frequency is a statistical artifact, not a semantic relationship. It works often enough to appear as learning on standard benchmarks but fails on adversarial cases where the frequency pattern disagrees with the actual entailment label. After fine-tuning, LLMs perform significantly worse on adversarial instances than base models — they have learned the shortcut more deeply.
This is a general pattern in the vault: Can models pass tests while missing the actual grammar? shows that surface heuristics enable correct behavior on easy cases while degrading robustness on unusual ones. Fine-tuning amplifies this problem by rewarding the heuristic through the training signal. The model that appears to "learn inference" has learned to use training data statistics more efficiently.
What distinguishes this from the attestation bias (memorization of specific sentences): frequency bias operates at the corpus level — it is a statistical regularity learned from the distribution of natural text, not from specific memorized statements. Both are shortcuts that substitute for inference, but they originate from different levels of the training data.
Inquiring lines that use this note as a source 6
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can fine-tuning ever teach semantic inference instead of amplifying training shortcuts?
- Why does NLI fine-tuning amplify frequency bias instead of teaching inference?
- Does fine-tuning on NLI tasks amplify or reduce frequency bias in language models?
- Does fine-tuning on NLI tasks reduce or amplify frequency bias?
- Does training data format shape which reasoning strategies LLMs develop?
- How does the pretraining distribution shape what LLMs find hard?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do LLMs predict entailment based on what they memorized?
Explores whether language models make entailment decisions by recognizing memorized facts about the hypothesis rather than reasoning through the logical relationship between premise and hypothesis.
the complementary sentence-level bias; both are shortcuts substituting for inference
-
Can models pass tests while missing the actual grammar?
Do language models succeed on grammatical benchmarks by learning surface patterns rather than structural rules? This matters because correct outputs may hide reliance on shallow heuristics that fail on novel structures.
same pattern: surface statistics enabling apparent competence on easy cases
-
Why do language models fail at communicative optimization?
LLMs excel at learning surface statistical patterns from text but struggle with deeper principles of how language achieves efficient communication. What distinguishes these two types of linguistic knowledge?
the broader principle: corpus statistics as substitute for semantic understanding
-
Does supervised fine-tuning actually improve reasoning quality?
While SFT boosts final-answer accuracy, does it degrade the quality and informativeness of the reasoning steps that justify those answers? This matters for high-stakes domains requiring auditable decision-making.
cross-domain parallel: SFT amplifies accuracy-correlated shortcuts (domain patterns) at the cost of reasoning quality; same fine-tuning mechanism operating on different training distribution features
-
Why do language models struggle with historical legal cases?
Explores whether LLMs' training data recency bias creates systematic performance degradation on older cases, and what this reveals about how models represent temporal information in specialized domains.
same mechanism at a different axis: fine-tuning amplifies temporal recency distribution rather than frequency distribution
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- LLMs are Frequency Pattern Learners in Natural Language Inference
- On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
- Neutralizing Bias in LLM Reasoning using Entailment Graphs
- Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
- Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
- Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts
- Bigger is not always better: The importance of human-scale language modeling for psycholinguistics
Original note title
fine-tuning on nli amplifies llm frequency bias rather than teaching genuine inference