Can careful curation replace massive alignment datasets?

Does fine-tuning a strong pretrained model on 1000 carefully selected examples achieve alignment quality comparable to models trained on vastly larger datasets? This challenges assumptions about data volume in post-training.

Synthesis note · 2026-02-23 · sourced from Alignment

LIMA ("Less Is More for Alignment") establishes a foundational finding: given a strong pretrained language model, remarkably strong alignment performance can be achieved by fine-tuning on just 1,000 carefully curated training examples. This is the alignment-specific instantiation of a broader principle that pretraining does the heavy lifting and post-training is primarily about activating existing capabilities.

The finding connects to a converging evidence pattern across the vault:

Can a single training example unlock mathematical reasoning? — one example activates reasoning in RLVR; 1000 activate alignment in SFT
Can careful selection of 78 demos outperform massive training datasets? — 78 curated trajectories for agentic behavior; same principle
Can models improve themselves on tasks without verifiable answers? — identical count (1000) for reasoning catalyst

The consistent pattern: post-training interventions require far less data than assumed, but the quality bar is high. Random data at scale underperforms curated data at small scale. This is the "Less Is More" principle — the pretrained model already contains the capabilities; post-training teaches it when and how to deploy them, not what they are.

For alignment specifically, the implication challenges the industry's data collection approach. Massive RLHF annotation efforts with thousands of labelers may be optimizing the wrong variable. Careful curation of a small number of high-quality examples, targeting the specific behavioral patterns desired, may achieve comparable results at a fraction of the cost.

Inquiring lines that use this note as a source 36

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 118 in 2-hop network ·medium cluster Open in graph ↗

Can careful curation replace massive alignment d… Can a single training example unlock mathematical … Can careful selection of 78 demos outperform massi… Can models improve themselves on tasks without ver… Do base models already contain hidden reasoning ab… Can we train better models on less data?

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can a single training example unlock mathematical reasoning? Explores whether one example is enough to dramatically improve math problem-solving in language models, and whether learning continues after perfect memorization.
extreme data efficiency for reasoning; LIMA is the alignment parallel
Can careful selection of 78 demos outperform massive training datasets? Does strategic curation of high-quality demonstrations unlock agentic capability more efficiently than scaling training data? LIMI achieved 73.5% on AgencyBench with 78 samples versus 10K+ samples for competing models, suggesting data quality may matter more than quantity.
curation > volume for agentic behavior
Can models improve themselves on tasks without verifiable answers? Most self-improvement methods require verifiable correctness signals like math or code. Can models improve on open-ended instruction tasks where right answers aren't automatically checkable? And what minimal training is needed to unlock this?
same count, same principle, different domain
Do base models already contain hidden reasoning ability? Explores whether reasoning capability emerges during pre-training as a latent feature rather than being created by post-training methods like reinforcement learning or fine-tuning.
the theoretical foundation: post-training activates, it doesn't create
Can we train better models on less data? Can gradient-based influence estimation identify which instruction data actually matters most? The research explores whether selecting small subsets of training data by their similarity to target capabilities might outperform training on everything.
LESS provides the principled mechanism for LIMA-style curation: gradient-based influence estimation can identify which alignment examples matter most, operationalizing "careful curation" as a computable selection criterion rather than manual judgment

Can careful curation replace massive alignment datasets?

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4