Can information-gain principles improve how we choose what to label?

This explores whether the idea of 'pick the data point that teaches you the most' — information gain — can make labeling smarter: choosing what to annotate, what to ask, or whether to label at all, rather than labeling everything blindly.

This explores whether information-gain principles — choosing the item that most reduces your uncertainty — can improve how we decide what to label, rather than annotating exhaustively or at random. The corpus doesn't have a paper titled 'active learning for labeling,' but it has something more interesting: several pieces that each take a different angle on the same underlying logic, and reading them together sketches a real answer.

The cleanest demonstration of the principle itself is in question selection. How can models select the most informative question to ask? shows a model simulating the possible answers to each candidate question and scoring them by how much they'd shrink its uncertainty — then asking only the high-value one. That's information gain applied to 'what should I find out next,' and labeling is the same problem wearing different clothes: an unlabeled example is worth annotating exactly when its label would resolve the most uncertainty. A close cousin is Can simple uncertainty estimates beat complex adaptive retrieval?, where a model's own calibrated confidence decides *when* to bother retrieving — self-knowledge turns out to be a cheaper, better trigger than elaborate external heuristics. The transferable lesson for labeling: let the model's uncertainty tell you which examples are worth a human's time.

But here's the twist the corpus adds, and it's the thing you didn't know you wanted to know: information gain assumes the signal you're harvesting is clean, and labels often aren't. Do all annotation responses measure the same underlying thing? finds that annotations actually contain three different things — genuine preferences, non-attitudes (people answering when they have no real opinion), and constructed-on-the-spot preferences. If you greedily select the 'most informative' items by uncertainty, you may just be routing your annotation budget toward exactly the noisy, opinion-free cases that *look* uncertain but carry no stable signal. So information gain can improve what you label — but only if you first know which kind of signal a label is.

There's also a more radical reading: maybe the best way to choose what to label is to label less. Can models learn behavioral principles without preference labels? aligns models by maximizing mutual information between written principles and responses — no preference labels at all. Can model confidence work as a reward signal for reasoning? uses the model's own answer confidence as a reward signal instead of human labels. And Can LLMs efficiently generate taxonomies and label training data? has an LLM invent the label taxonomy and produce the labels itself before distilling into a cheap classifier. All three reframe 'what to label' as 'do we need to label this by hand at all?' — pushing the information-gain question upstream from selection to elimination.

The honest synthesis: yes, information-gain principles plausibly improve labeling choices, and the question-selection and uncertainty-routing work give you a concrete mechanism. But the corpus's strongest contribution is a warning — uncertainty is a good compass for *what's worth labeling* only after you've separated real signal from noise, and sometimes the highest-information move is to let the model generate or self-supervise the labels rather than spend human effort at all.

Sources 6 notes

How can models select the most informative question to ask?

UoT combines uncertainty-aware scenario simulation with information-gain scoring and reward propagation to identify questions whose possible answers maximally reduce diagnostic uncertainty—providing a principled mechanism for specific, high-value clarification rather than generic prompts.

Can simple uncertainty estimates beat complex adaptive retrieval?

Calibrated token-probability uncertainty consistently beats multi-call adaptive retrieval on single-hop tasks and matches performance on multi-hop, using a fraction of the LM and retriever calls. The model's self-knowledge proves more reliable than external heuristics for deciding when to retrieve.

Do all annotation responses measure the same underlying thing?

Behavioral science reveals that annotations contain genuine preferences, non-attitudes, and constructed preferences—distinguishable by consistency across measurement conditions. Treating them uniformly contaminates reward model training and downstream alignment.

Can models learn behavioral principles without preference labels?

SAMI finetunes language models to increase mutual information between constitutions and responses without preference labels or demonstrations. A mistral-7b trained this way outperformed base and instruction-tuned baselines, and surprisingly, a weaker model could write principles to align a stronger one.

Can model confidence work as a reward signal for reasoning?

RLSF uses answer-span confidence to rank reasoning traces, creating synthetic preferences that strengthen step-by-step reasoning while reversing RLHF's calibration degradation—without requiring human labels or external verifiers.

Can LLMs efficiently generate taxonomies and label training data?

TnT-LLM automates text mining by using LLMs for open-ended reasoning to create and refine label taxonomies and generate training labels, then distilling these into lightweight classifiers for cost-effective deployment at scale.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a researcher re-evaluating active-learning and data-annotation strategy in light of recent LLM capability shifts. The question: can information-gain principles—selecting unlabeled items by their uncertainty-reduction potential—improve annotation efficiency and model training?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat as baseline, not current state.
- Uncertainty-aware selection outperforms random or heuristic labeling: a model's calibrated confidence can route annotation budget to high-value examples (2024–2025).
- Annotations decompose into three signal types (genuine preference, non-attitude, constructed opinion); greedy uncertainty-based selection may route budget to noisy, opinion-free cases that *look* uncertain but carry no stable signal (~2024).
- Models can align via mutual information between written principles and responses, bypassing preference labels entirely; LLMs can generate label taxonomies and self-label before distillation (~2024–2025).
- Self-feedback and model confidence as intrinsic reward can substitute for human labels (~2025–2026).

Anchor papers (verify; mind their dates):
- 2402.03271 (Uncertainty of Thoughts: UoT planning for information seeking)
- 2404.14313 (Self-Supervised Alignment with Mutual Information)
- 2501.12835 (Adaptive Retrieval & Uncertainty, 2025)
- 2604.03238 (Measuring Human Preferences in RLHF as social science, 2026)

Your task:
(1) RE-TEST EACH CONSTRAINT. For each finding above, judge whether newer model scale, in-context learning sophistication, reasoning-chain methods (CoT, multi-step), or self-supervised techniques have since relaxed the signal-decomposition problem or the need for human labels. Separate durable insight (uncertainty *is* predictive of informativeness) from perishable limitation (humans must validate signal type). Cite what moved the needle.
(2) Surface the strongest work from the last 6 months that contradicts or supersedes the claim that uncertainty-driven selection is the best active-learning strategy. Look for evidence that other selection criteria (e.g., diversity, representativeness, or model-generated labels) now outperform uncertainty alone.
(3) Propose 2 research questions that assume the regime has shifted: (a) given that models can self-label and self-align, what does the *next* human annotation step actually need to validate?, and (b) can information-gain principles be applied not to selecting *what* to label, but to selecting *which human* should validate a model's output?.

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can information-gain principles improve how we choose what to label?

Sources 6 notes

Next inquiring lines