SYNTHESIS NOTE
Recommender Systems Model Architecture and Internals

Can MLPs learn to match dot product similarity in practice?

Universal approximation theory suggests MLPs should learn any similarity function, including dot product. But does this theoretical promise hold up when training on real, finite datasets with practical constraints?

Synthesis note · 2026-05-03 · sourced from Recommenders Architectures
What breaks when specialized AI models reach real users?

The Neural Collaborative Filtering paper popularized replacing the dot product with a learned MLP for combining user and item embeddings. The justification was theoretical: an MLP is a universal function approximator, so it can in principle learn any similarity function — including dot product — and presumably better ones. Rendle et al.'s revisit shows this argument fails empirically and operationally.

Empirically, with careful hyperparameter selection, a properly configured dot product baseline substantially outperforms the MLP. Even more pointedly, learning a dot product through an MLP requires a large model capacity and a lot of training data — the universal approximation guarantee is asymptotic, and finite-data inductive bias matters more than expressiveness. The MLP is too flexible for the task; its inductive bias points away from the simple geometric similarity that actually fits the data.

Operationally, dot products allow maximum-inner-product search over precomputed item embeddings, which is fast enough for real-time serving over millions of items. MLP similarities require a forward pass per item per query — they cannot be precomputed. So even if MLPs were marginally more accurate, they would be unaffordable in production.

The takeaway: an inductive bias that matches the geometry of the problem (dot product) wins over an expressive parameterization that has to learn the geometry from scratch.

Inquiring lines that use this note as a source 10

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 128 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

MLP similarity does not approximate dot product in practice — universal approximation theorems do not survive contact with finite data