Why does multinomial likelihood work better for click prediction?
Explores whether the choice of likelihood function—multinomial versus Gaussian or logistic—affects recommendation performance, and what structural properties make one better suited to modeling user clicks.
The choice of likelihood function in collaborative filtering looks like a technical detail but is actually a structural commitment about what the data represents. Gaussian likelihoods model each interaction as an independent observation of a continuous quantity. Logistic likelihoods model each interaction as an independent binary classification. Both treat items as separate prediction targets.
Liang et al. argue the multinomial likelihood is structurally correct for click data because of competition. The model has a probability budget that must sum to 1 across all items. Putting probability on one item necessarily takes it away from others. This forces the model to assign more mass to items that are more likely to be clicked, which is exactly what top-N ranking metrics reward. Gaussian and logistic models can assign high probability to many items simultaneously without penalty, so they don't optimize for the relative ordering that recommendation actually requires.
The deeper point is that the likelihood is a closer proxy to the evaluation metric than logistic or Gaussian. Top-N ranking loss is hard to optimize directly, but multinomial likelihood induces the same kind of competition implicitly. The match between training objective and evaluation objective is doing the work — not anything specific to neural networks.
Inquiring lines that use this note as a source 9
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What makes the Brier score mathematically better than log-likelihood here?
- Should recommendation evaluation enforce probability competition between candidate items?
- How do implicit signals like clicks capture preference more reliably than explicit ratings?
- How do co-clicking patterns in bipartite graphs capture product substitutes from noisy behavior?
- Why does sparsity per user make probabilistic models more effective?
- How does per-user sparsity influence likelihood choice for recommendations?
- How does item frequency skew relate to per-user interaction sparsity?
- How do Bayesian models share statistical strength across sparse user datasets?
- Why do multinomial likelihoods outperform Gaussian models for recommendation?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why does multinomial likelihood work better for ranking recommendations?
Explores whether the choice of likelihood function in VAE-based collaborative filtering matters for matching training objectives to ranking evaluation metrics. Why items should compete for probability mass.
extends: paired statement of the same Liang result emphasizing the implicit-CF setting
-
Can implicit feedback reveal both preference and confidence?
When users take implicit actions like purchases or watches, do those signals carry two separable pieces of information: what they prefer and how certain we should be? Explicit ratings can't make that distinction.
complements: implicit-feedback structure motivates the multinomial framing — clicks are observation events that compete for user attention
-
Why does collaborative filtering struggle with sparse user data?
Collaborative filtering datasets appear massive but hide a fundamental challenge: each user has rated only a tiny fraction of items. How does this per-user sparsity shape the modeling problem, and what techniques can overcome it?
grounds: VAE-multinomial works because Bayesian latent variable models compensate for per-user sparsity
-
Can a linear model beat deep collaborative filtering?
Does a shallow linear autoencoder with a zero-diagonal constraint outperform deeper neural models on collaborative filtering tasks? This challenges the field's assumption that depth and nonlinearity drive performance.
complements: same right-prior-beats-depth lesson — likelihood choice and constraint choice both prove structural priors dominate capacity
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Variational Autoencoders for Collaborative Filtering
- Using Navigation to Improve Recommendations in Real-Time
- InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models
- Neural Collaborative Filtering
- Language Model Personalization via Reward Factorization
- Reconciling the accuracy-diversity trade-off in recommendations
- Consistent Explainers or Unreliable Narrators? Understanding LLM-generated Group Recommendations
- How new data permeates LLM knowledge and how to dilute it
Original note title
multinomial likelihoods outperform Gaussian and logistic for click data because items must compete for limited probability mass