Why do position discounts in ranking metrics match user abandonment patterns?
This explores why the steep position-weighting in ranking metrics (like the logarithmic discount in NDCG, where rank 1 counts far more than rank 10) lines up with how users actually stop scrolling — and what that correspondence reveals about feedback loops in recommendation systems.
This explores why the steep position-weighting baked into ranking metrics tracks real user abandonment — and the short answer the corpus points to is that both are downstream of the same thing: position is not a neutral coordinate, it's an attention budget that decays fast. A metric that discounts lower ranks and a user who quits after the first few results are measuring the same decay from two directions. The interesting part isn't that they match — it's what happens when a training system mistakes one for the other.
The sharpest material here is on position bias as a *confound* rather than a signal. YouTube's multi-objective ranker deliberately bolts on a shallow "position tower" whose entire job is to absorb the effect of where an item was shown, so the main model doesn't learn that 'rank 1 is good' when really 'rank 1 got the clicks because users never looked past it' (Why do ranking systems need to model selection bias explicitly?). That's the abandonment pattern made explicit: users discount low positions, so the data discounts them too, and unless you model that separately the system reads its own past placement decisions as quality and amplifies them into a degenerate loop. The position discount in your metric and the position discount in user behavior are the *same curve* — which is exactly why they're so easy to confuse and so dangerous to train on naively.
There's a second, quieter reason the match holds: ranking objectives that win are the ones that build competition *between* items into the math, mirroring the scarcity of user attention. Switching a recommender VAE to a multinomial likelihood beats Gaussian or logistic precisely because it forces items to compete for a fixed probability mass, which aligns training with top-N ranking instead of scoring each item in isolation (Why does multinomial likelihood work better for ranking recommendations?). A user with a decaying attention budget is doing the same thing — allocating a fixed, shrinking resource across positions. Metrics that assume independent relevance per slot drift from behavior; metrics that assume competition track it.
The thing you might not have come looking for: this attention-decay structure isn't evenly distributed across users, and that's where it bites. Recommendation data follows a power law, so the entities a model most needs to get right — the high-frequency users and head items that dominate the top positions — are exactly where hashing collisions and representational shortcuts pile up (Why do hash collisions hurt recommendation models so much?). And once a feed is shaping what people even get the chance to abandon, the position curve stops being a measurement and becomes an intervention: feed weights steer producer behavior and opinion convergence at population scale, so the abandonment pattern your metric flatters is partly a pattern the metric created (How do recommendation feeds shape what people see and believe?).
So the honest synthesis is that position discounts match abandonment because they're two faces of attention scarcity — but the corpus's real warning is that this tidy correspondence is the seam where feedback loops slip in. The systems that handle it well treat position as a bias to be subtracted, not a reward to be chased.
Sources 4 notes
YouTube's multi-objective ranker uses MMoE for conflicting objectives and a shallow position tower to remove selection bias from training data. Without both mechanisms, models converge on degenerate equilibria that amplify their own past decisions.
Liang et al. show that switching VAE likelihoods from Gaussian/logistic to multinomial achieves state-of-the-art results because enforced probability competition between items directly aligns training with top-N ranking objectives. Rebalancing KL regularization further improves performance.
Monolith's empirical work shows that real recommendation systems have power-law distributed frequencies, causing collisions to accumulate precisely on the entities models need most accurate. Fixed-size hashed tables worsen this over time as new IDs arrive.
Research shows recommendation systems operate as political actors: feed weights influence producer behavior, network topology drives opinion convergence, and automation enables targeted persuasion at population scale. These effects compound through rating contamination and selection biases.