Can information-gain principles improve how we choose what to label?
This explores whether the idea of 'pick the data point that teaches you the most' — information gain — can make labeling smarter: choosing what to annotate, what to ask, or whether to label at all, rather than labeling everything blindly.
This explores whether information-gain principles — choosing the item that most reduces your uncertainty — can improve how we decide what to label, rather than annotating exhaustively or at random. The corpus doesn't have a paper titled 'active learning for labeling,' but it has something more interesting: several pieces that each take a different angle on the same underlying logic, and reading them together sketches a real answer.
The cleanest demonstration of the principle itself is in question selection. How can models select the most informative question to ask? shows a model simulating the possible answers to each candidate question and scoring them by how much they'd shrink its uncertainty — then asking only the high-value one. That's information gain applied to 'what should I find out next,' and labeling is the same problem wearing different clothes: an unlabeled example is worth annotating exactly when its label would resolve the most uncertainty. A close cousin is Can simple uncertainty estimates beat complex adaptive retrieval?, where a model's own calibrated confidence decides *when* to bother retrieving — self-knowledge turns out to be a cheaper, better trigger than elaborate external heuristics. The transferable lesson for labeling: let the model's uncertainty tell you which examples are worth a human's time.
But here's the twist the corpus adds, and it's the thing you didn't know you wanted to know: information gain assumes the signal you're harvesting is clean, and labels often aren't. Do all annotation responses measure the same underlying thing? finds that annotations actually contain three different things — genuine preferences, non-attitudes (people answering when they have no real opinion), and constructed-on-the-spot preferences. If you greedily select the 'most informative' items by uncertainty, you may just be routing your annotation budget toward exactly the noisy, opinion-free cases that *look* uncertain but carry no stable signal. So information gain can improve what you label — but only if you first know which kind of signal a label is.
There's also a more radical reading: maybe the best way to choose what to label is to label less. Can models learn behavioral principles without preference labels? aligns models by maximizing mutual information between written principles and responses — no preference labels at all. Can model confidence work as a reward signal for reasoning? uses the model's own answer confidence as a reward signal instead of human labels. And Can LLMs efficiently generate taxonomies and label training data? has an LLM invent the label taxonomy and produce the labels itself before distilling into a cheap classifier. All three reframe 'what to label' as 'do we need to label this by hand at all?' — pushing the information-gain question upstream from selection to elimination.
The honest synthesis: yes, information-gain principles plausibly improve labeling choices, and the question-selection and uncertainty-routing work give you a concrete mechanism. But the corpus's strongest contribution is a warning — uncertainty is a good compass for *what's worth labeling* only after you've separated real signal from noise, and sometimes the highest-information move is to let the model generate or self-supervise the labels rather than spend human effort at all.
Sources 6 notes
UoT combines uncertainty-aware scenario simulation with information-gain scoring and reward propagation to identify questions whose possible answers maximally reduce diagnostic uncertainty—providing a principled mechanism for specific, high-value clarification rather than generic prompts.
Calibrated token-probability uncertainty consistently beats multi-call adaptive retrieval on single-hop tasks and matches performance on multi-hop, using a fraction of the LM and retriever calls. The model's self-knowledge proves more reliable than external heuristics for deciding when to retrieve.
Behavioral science reveals that annotations contain genuine preferences, non-attitudes, and constructed preferences—distinguishable by consistency across measurement conditions. Treating them uniformly contaminates reward model training and downstream alignment.
SAMI finetunes language models to increase mutual information between constitutions and responses without preference labels or demonstrations. A mistral-7b trained this way outperformed base and instruction-tuned baselines, and surprisingly, a weaker model could write principles to align a stronger one.
RLSF uses answer-span confidence to rank reasoning traces, creating synthetic preferences that strengthen step-by-step reasoning while reversing RLHF's calibration degradation—without requiring human labels or external verifiers.
TnT-LLM automates text mining by using LLMs for open-ended reasoning to create and refine label taxonomies and generate training labels, then distilling these into lightweight classifiers for cost-effective deployment at scale.