SYNTHESIS NOTE

Why does multinomial likelihood work better for click prediction?

Explores whether the choice of likelihood function—multinomial versus Gaussian or logistic—affects recommendation performance, and what structural properties make one better suited to modeling user clicks.

Synthesis note · 2026-05-03 · sourced from Recommenders Architectures

The choice of likelihood function in collaborative filtering looks like a technical detail but is actually a structural commitment about what the data represents. Gaussian likelihoods model each interaction as an independent observation of a continuous quantity. Logistic likelihoods model each interaction as an independent binary classification. Both treat items as separate prediction targets.

Liang et al. argue the multinomial likelihood is structurally correct for click data because of competition. The model has a probability budget that must sum to 1 across all items. Putting probability on one item necessarily takes it away from others. This forces the model to assign more mass to items that are more likely to be clicked, which is exactly what top-N ranking metrics reward. Gaussian and logistic models can assign high probability to many items simultaneously without penalty, so they don't optimize for the relative ordering that recommendation actually requires.

The deeper point is that the likelihood is a closer proxy to the evaluation metric than logistic or Gaussian. Top-N ranking loss is hard to optimize directly, but multinomial likelihood induces the same kind of competition implicitly. The match between training objective and evaluation objective is doing the work — not anything specific to neural networks.

Inquiring lines that use this note as a source 9

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 88 in 2-hop network ·medium cluster Open in graph ↗

Why does multinomial likelihood work better for … Why does multinomial likelihood work better for ra… Can implicit feedback reveal both preference and c… Why does collaborative filtering struggle with spa… Can a linear model beat deep collaborative filteri…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why does multinomial likelihood work better for ranking recommendations? Explores whether the choice of likelihood function in VAE-based collaborative filtering matters for matching training objectives to ranking evaluation metrics. Why items should compete for probability mass.
extends: paired statement of the same Liang result emphasizing the implicit-CF setting
Can implicit feedback reveal both preference and confidence? When users take implicit actions like purchases or watches, do those signals carry two separable pieces of information: what they prefer and how certain we should be? Explicit ratings can't make that distinction.
complements: implicit-feedback structure motivates the multinomial framing — clicks are observation events that compete for user attention
Why does collaborative filtering struggle with sparse user data? Collaborative filtering datasets appear massive but hide a fundamental challenge: each user has rated only a tiny fraction of items. How does this per-user sparsity shape the modeling problem, and what techniques can overcome it?
grounds: VAE-multinomial works because Bayesian latent variable models compensate for per-user sparsity
Can a linear model beat deep collaborative filtering? Does a shallow linear autoencoder with a zero-diagonal constraint outperform deeper neural models on collaborative filtering tasks? This challenges the field's assumption that depth and nonlinearity drive performance.
complements: same right-prior-beats-depth lesson — likelihood choice and constraint choice both prove structural priors dominate capacity

Why does multinomial likelihood work better for click prediction?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4