What dominates AI compute in production systems today?
While public discussion centers on large language models, Facebook's infrastructure data reveals a different story about which AI workloads actually consume the most compute cycles in real production environments.
Public discussion of AI compute centers on training and inference for large language models. Facebook's published architecture analysis tells a different story. DNN-based personalized recommendation models comprise up to 79% of AI inference cycles in their production data center. Just three model classes — RMC1, RMC2, RMC3 — account for up to 65% of inference cycles, despite hundreds of recommendation models running across the system.
These models follow a distinct architectural pattern that drives their compute profile. Inputs combine dense features (continuous, like user age) with sparse categorical features (like preferred genres or device types). Sparse features are encoded as multi-hot vectors with potentially millions of categories, but only a few entries are active per user. Mapping these to dense embedding vectors requires embedding-table lookups — operations that are memory-bound rather than compute-bound, which inverts the compute profile of more familiar transformer or convnet workloads.
The implication is that production AI infrastructure is shaped by recommendation, not by the model types that get research attention. Embedding-table operations, sparse feature handling, and the storage capacity for billion-parameter embedding tables are the engineering constraints. McKinsey and TechEmergence estimated recommendation drives up to 35% of Amazon's revenue; Netflix and YouTube data put the figures at 75% of movies watched and 60% of videos consumed. The economic gravity of recommendation in production drives the dominant inference workload — yet methods papers tend to underweight this reality compared to the visibility of LLM compute.
Inquiring lines that use this note as a source 1
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do hash collisions really harm popular recommendation items?
Hash-based embedding tables assume uniform ID distribution, but real recommender systems show heavy-tailed frequency patterns. The question explores whether collisions actually concentrate damage on the high-traffic entities that matter most.
grounds: production scale that makes embedding-table problems first-order — 79% of inference cycles makes any degradation costly
-
Why do hash collisions hurt recommendation models so much?
Explores whether standard low-collision hashing works for embedding tables in recommenders, given that user and item frequencies follow power-law distributions rather than uniform ones.
grounds: same scale problem from infrastructure angle
-
How do feed ranking weights shape what content gets produced?
Feed-ranking weights are typically treated as neutral tuning parameters, but do they actually function as political levers that reshape producer behavior and the content supply itself?
extends: the scale of production recommendation makes the political consequences of weight-choice population-wide
-
Can small language models handle most agent tasks?
Explores whether smaller, cheaper models are actually sufficient for the repetitive, scoped work that dominates deployed agent systems, rather than relying on large models by default.
complements: same compute-economics argument — SLM-first for agentic, three-class-DNN for recommendation — both refuse foundation-model defaults at production scale
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- The Architectural Implications of Facebook’s DNN-based Personalized Recommendation
- Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks
- Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics
- The Labor Market Effects of Generative Artificial Intelligence
- Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
- AI Compute Architecture and Evolution Trends
- DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
- Monolith: Real Time Recommendation System With Collisionless Embedding Table
Original note title
personalized recommendation models drive 79 percent of Facebook AI inference cycles — three model classes consume two-thirds of total compute