SYNTHESIS NOTE
Recommender Systems

What dominates AI compute in production systems today?

While public discussion centers on large language models, Facebook's infrastructure data reveals a different story about which AI workloads actually consume the most compute cycles in real production environments.

Synthesis note · 2026-05-03 · sourced from Recommenders Personalized
What breaks when specialized AI models reach real users? How do recommendation feeds shape what people see and believe?

Public discussion of AI compute centers on training and inference for large language models. Facebook's published architecture analysis tells a different story. DNN-based personalized recommendation models comprise up to 79% of AI inference cycles in their production data center. Just three model classes — RMC1, RMC2, RMC3 — account for up to 65% of inference cycles, despite hundreds of recommendation models running across the system.

These models follow a distinct architectural pattern that drives their compute profile. Inputs combine dense features (continuous, like user age) with sparse categorical features (like preferred genres or device types). Sparse features are encoded as multi-hot vectors with potentially millions of categories, but only a few entries are active per user. Mapping these to dense embedding vectors requires embedding-table lookups — operations that are memory-bound rather than compute-bound, which inverts the compute profile of more familiar transformer or convnet workloads.

The implication is that production AI infrastructure is shaped by recommendation, not by the model types that get research attention. Embedding-table operations, sparse feature handling, and the storage capacity for billion-parameter embedding tables are the engineering constraints. McKinsey and TechEmergence estimated recommendation drives up to 35% of Amazon's revenue; Netflix and YouTube data put the figures at 75% of movies watched and 60% of videos consumed. The economic gravity of recommendation in production drives the dominant inference workload — yet methods papers tend to underweight this reality compared to the visibility of LLM compute.

Inquiring lines that use this note as a source 1

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 109 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

personalized recommendation models drive 79 percent of Facebook AI inference cycles — three model classes consume two-thirds of total compute