Can dataset-level debiasing methods fix popularity bias inherited from pretraining?

This explores whether the usual recommender-system debiasing tricks — reweighting or rebalancing the training data — can correct a popularity bias that actually originates in a language model's pretraining rather than in the dataset you fine-tune on.

This explores whether dataset-level fixes can undo a bias that was never in your dataset to begin with. The short answer the corpus gives is: no — and the reason is a mismatch between where the bias lives and where the fix is applied. When an LLM recommends items, it tends to surface whatever was popular in *its pretraining corpus*, not what's popular in your target data. One study found GPT-4 keeps recommending The Shawshank Redemption across datasets with completely different popularity distributions — a domain-shift effect that ordinary debiasing simply can't reach, because it's rebalancing the wrong distribution Where does LLM recommendation bias actually come from?.

This isn't a one-off. A broader analysis finds LLM recommenders carry three distinct biases — position, popularity, and fairness — all stemming from the language model's pretraining objective and corpus demographics rather than from interaction data, and concludes that mitigation needs LLM-specific methods, not collaborative-filtering debiasing tricks ported over from classic recommenders Where do recommendation biases come from in language models?. The deeper pattern shows up beyond recommendation too: a causal experiment that varied random seeds and swapped fine-tuning data found that models sharing a pretrained backbone keep the same cognitive biases regardless of what you fine-tune on — fine-tuning only *modulates* what pretraining already planted Where do cognitive biases in language models come from?. Even reinforcement learning, applied after pretraining, mostly amplifies one format that was already dominant in the pretraining distribution rather than introducing something new Does RL training collapse format diversity in pretrained models?.

What's worth noticing is that dataset-level debiasing *does* work — when the bias genuinely lives in the data. YouTube's ranker pulls selection bias out of training logs with a dedicated position tower, breaking the feedback loop where a model amplifies its own past decisions Why do ranking systems need to model selection bias explicitly?. So the corpus isn't saying debiasing is useless — it's saying the lever has to match the cause. When popularity bias comes from low embedding dimensionality, the fix isn't the data either; it's treating dimensionality as a fairness hyperparameter, because small embeddings overfit toward popular items to maximize ranking quality and can't be patched post-hoc Does embedding dimensionality secretly drive popularity bias in recommenders?.

The thing you might not have expected: 'popularity bias' is really several different problems wearing the same name. One version is selection bias in your logs (fixable with data-level methods), one is an architectural artifact of embedding size (fixable only by changing the model's geometry), and one is a residue of pretraining (reachable only with LLM-specific intervention). Reaching for dataset reweighting on the third kind is like adjusting the thermostat to fix a window that's painted shut — the right tool for a problem that isn't yours.

Sources 6 notes

Where does LLM recommendation bias actually come from?

GPT-4 concentrates recommendations on items popular in its pretraining corpus rather than in target datasets. The Shawshank Redemption dominates across different datasets even when they have different popularity distributions, revealing a domain-shift effect that standard debiasing methods cannot address.

Where do recommendation biases come from in language models?

Wu et al. show that LLM-based recommendation systems exhibit position bias, popularity bias, and fairness bias—unique failure modes stemming from the language model's pretraining objective and corpus demographics rather than interaction data. Mitigation requires LLM-specific approaches, not adapted collaborative filtering techniques.

Where do cognitive biases in language models come from?

A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Why do ranking systems need to model selection bias explicitly?

YouTube's multi-objective ranker uses MMoE for conflicting objectives and a shallow position tower to remove selection bias from training data. Without both mechanisms, models converge on degenerate equilibria that amplify their own past decisions.

Does embedding dimensionality secretly drive popularity bias in recommenders?

Research shows that when user/item embedding dimensions are too small, recommender systems overfit toward popular items to maximize ranking quality. This compounds over time as niche items receive insufficient exposure, and cannot be fixed post-hoc without treating dimensionality as a fairness hyperparameter.

Can dataset-level debiasing methods fix popularity bias inherited from pretraining?

Sources 6 notes

Next inquiring lines