How does the Learning Law explain why all examples should contribute equally?
This explores the idea—rooted in the 'learning mechanics' view of deep learning—that training treats examples as interchangeable contributors to an aggregate signal, and tests that premise against corpus findings showing examples are emphatically *not* equal.
This reads the question as being about the 'learning mechanics' frame, where training is modeled like statistical physics: you predict average-case behavior from aggregate statistics rather than tracking any individual data point. In that view, the natural assumption is that each example is one more sample contributing to a distribution, and the law that governs learning is about how those samples average out across training dynamics rather than what any single one does Can deep learning theory unify around training dynamics?. That's the 'equal contribution' intuition: no example is special, the macroscopic curve is what matters.
The interesting thing is that the rest of the corpus argues almost the opposite—and that tension is the real answer here. A single training example in RLVR can lift math accuracy from 36% to 73.6% and keep improving test performance for 1,400 steps after training accuracy already hit 100% Can a single training example unlock mathematical reasoning?. Critique fine-tuning gets RLVR-level reasoning activation from exactly one problem Can a single problem unlock reasoning through solution critique?. If examples contributed equally, one couldn't carry that much weight. The reconciliation is that these aren't *teaching* in the average-case sense—they're *activating* latent capability already in the base model. The aggregate-statistics law governs what gets learned; a single well-chosen signal can flip a switch.
Once you accept examples differ in value, the question becomes which ones to weight. Optimal experimental design beats similarity-based retrieval for few-shot selection precisely because it picks examples that maximally reduce uncertainty rather than treating them uniformly Can optimal experimental design improve few-shot example selection?. Deliberately inducing the model to *err* on certain few-shot examples, then having it articulate the principle behind the mistake, beats showing clean examples—so an example's contribution depends on what error it surfaces, not just its presence Does learning from mistakes improve in-context learning?.
And contribution isn't even a property of the example alone—it depends on the learner. Teacher-refined data that is objectively higher quality *degrades* a student when it exceeds the student's learning frontier; students do best filtering refinements against their own statistical profile Does teacher-refined data always improve student model performance?. So the same example contributes positively to one model and negatively to another. The honest synthesis: the learning-mechanics law explains why training looks like aggregate dynamics, but it does not license treating examples as equal—corpus evidence repeatedly shows that selection, ordering, error content, and learner-compatibility dominate. The 'equal contribution' premise is a modeling convenience, not a finding.
If you want the deepest cut, look at how outcome-based RL sharpens the policy unevenly across solved and unsolved problems—the gradient signal itself is distributed unequally and even transfers diversity loss from one to the other Does outcome-based RL diversity loss spread across unsolved problems?. That's a concrete mechanism for *why* equal weighting fails even when the aggregate law holds.
Sources 7 notes
Research shows learning mechanics is consolidating as a unified frame for deep learning, modeled on classical and statistical mechanics. It prioritizes average-case predictions, training dynamics, and aggregate statistics over worst-case bounds, mirroring how physics addresses macroscopic systems.
A single example in RLVR boosts math performance from 36% to 73.6% and enables test accuracy to improve for 1,400 steps after training accuracy reaches 100%, revealing that minimal activation signals unlock latent reasoning capability.
Critique Fine-Tuning achieves reasoning activation comparable to RLVR using only one problem and teacher-generated critiques of varied solutions, with no reinforcement learning. This demonstrates that exposure to correct versus incorrect reasoning on a specific problem is the sufficient activation signal.
AIPD frames demonstration selection as budgeted active learning, choosing examples that maximally reduce test-set uncertainty. Two algorithms (GO and SAL) outperformed similarity-based methods across small, medium, and large language models.
LEAP demonstrates that models achieve better performance on reasoning and math tasks by intentionally erring on few-shot examples, reflecting on mistakes, and deriving explicit task-specific principles—without additional labeled data or fine-tuning.
Teacher-refined data degrades performance when it exceeds the student's learning frontier, even if objectively higher quality. Students should filter refinements using their own statistical profile to retain only compatible improvements.
RL that rewards only final answer correctness sharpens the policy globally, concentrating probability mass on correct trajectories for solved problems while simultaneously reducing diversity on unsolved ones. Historical exploration (training diversity via UCB-style bonuses) and batch exploration (test-time diversity via repetition penalties) require structurally different mechanisms.