Can simpler models beat deep networks for recommendation systems?

Inquiring lines that use this note as a source 23

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

How does precision matrix structure differ from covariance in recommendations?
How does the zero-diagonal constraint enable generalization in collaborative filtering?
What distinguishes hard filtering from soft ranking in recommendation systems?
How do structural constraints like zero self-similarity improve collaborative filtering?
Why does inductive bias outweigh model capacity in recommender systems?
What structural constraints replace depth in collaborative filtering?
How does nesting optimization levels improve on traditional network depth?
Why do hierarchical architectures better implement the deep research definition?
Do embedding collisions explain popularity overfitting in recommendation models?
Why do embedding-based recommendation models fail with sparse user history?
Can recommender systems separate true preference from individual rating style bias?
What non-linear patterns do autoencoders discover that matrix factorization misses?
Why do standard supervised models miss high-order connectivity in recommendations?
Why do dual-encoder embeddings fail to capture task-relevant recommendations despite semantic similarity?
How does this compare to trained autoencoder approaches for thought sharing?
Can structural priors outperform raw model capacity in collaborative filtering?
Can simpler collaborative filtering models outperform deep architectures?
Can fractured entangled representations hide undetected by standard analysis methods?
Why do transductive recommenders fail where inductive learning succeeds?
Can hypernetworks generate recommendation parameters more efficiently than retraining full models?
What sparse high-rank patterns does the deep tower fail to capture?
Can autoencoders act as associative memory systems like Hopfield networks?
Can encoder-only architectures match decoder-based sequential models for recommendation?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 68 in 2-hop network ·medium cluster Open in graph ↗

Can simpler models beat deep networks for recomm… Can a linear model beat deep collaborative filteri… Can MLPs learn to match dot product similarity in … Why does dot product beat MLP-based similarity in … Why does multinomial likelihood work better for ra…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can a linear model beat deep collaborative filtering? Does a shallow linear autoencoder with a zero-diagonal constraint outperform deeper neural models on collaborative filtering tasks? This challenges the field's assumption that depth and nonlinearity drive performance.
extends: paired re-statement of the same EASE result emphasizing that anti-affinity (negative weights) is the under-appreciated mechanism
Can MLPs learn to match dot product similarity in practice? Universal approximation theory suggests MLPs should learn any similarity function, including dot product. But does this theoretical promise hold up when training on real, finite datasets with practical constraints?
complements: same anti-deep-CF lesson — capacity isn't the bottleneck, the right structural prior is
Why does dot product beat MLP-based similarity in practice? Neural Collaborative Filtering theory suggests MLPs should outperform dot products as universal approximators. But what explains the empirical gap, and what role do data scale and deployment constraints play?
complements: paired anti-MLP result reinforcing that inductive bias > capacity in CF
Why does multinomial likelihood work better for ranking recommendations? Explores whether the choice of likelihood function in VAE-based collaborative filtering matters for matching training objectives to ranking evaluation metrics. Why items should compete for probability mass.
complements: another structural-prior-matters-more-than-capacity result — likelihood choice over architectural depth

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Embarrassingly Shallow Autoencoders for Sparse Data*0.91 match · arxiv ↗
Neural Collaborative Filtering vs. Matrix Factorization Revisited0.85 match · arxiv ↗
Variational Autoencoders for Collaborative Filtering0.85 match · arxiv ↗
Collaborative Deep Learning for Recommender Systems0.84 match · arxiv ↗
Curse of “Low” Dimensionality in Recommender Systems0.84 match · arxiv ↗
Scalable Neural Contextual Bandit for Recommender Systems0.83 match · arxiv ↗
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)0.83 match · arxiv ↗
InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models0.83 match · arxiv ↗

Search by related questions 4

Suggested questions this note speaks to — click to search the collection, or type your own.