SYNTHESIS NOTE

What dominates AI compute in production systems today?

While public discussion centers on large language models, Facebook's infrastructure data reveals a different story about which AI workloads actually consume the most compute cycles in real production environments.

Synthesis note · 2026-05-03 · sourced from Recommenders Personalized

Public discussion of AI compute centers on training and inference for large language models. Facebook's published architecture analysis tells a different story. DNN-based personalized recommendation models comprise up to 79% of AI inference cycles in their production data center. Just three model classes — RMC1, RMC2, RMC3 — account for up to 65% of inference cycles, despite hundreds of recommendation models running across the system.

These models follow a distinct architectural pattern that drives their compute profile. Inputs combine dense features (continuous, like user age) with sparse categorical features (like preferred genres or device types). Sparse features are encoded as multi-hot vectors with potentially millions of categories, but only a few entries are active per user. Mapping these to dense embedding vectors requires embedding-table lookups — operations that are memory-bound rather than compute-bound, which inverts the compute profile of more familiar transformer or convnet workloads.

The implication is that production AI infrastructure is shaped by recommendation, not by the model types that get research attention. Embedding-table operations, sparse feature handling, and the storage capacity for billion-parameter embedding tables are the engineering constraints. McKinsey and TechEmergence estimated recommendation drives up to 35% of Amazon's revenue; Netflix and YouTube data put the figures at 75% of movies watched and 60% of videos consumed. The economic gravity of recommendation in production drives the dominant inference workload — yet methods papers tend to underweight this reality compared to the visibility of LLM compute.

Inquiring lines that use this note as a source 1

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

What production costs does personalization infrastructure impose on AI systems?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 109 in 2-hop network ·medium cluster Open in graph ↗

What dominates AI compute in production systems … Do hash collisions really harm popular recommendat… Why do hash collisions hurt recommendation models … How do feed ranking weights shape what content get… Can small language models handle most agent tasks?

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do hash collisions really harm popular recommendation items? Hash-based embedding tables assume uniform ID distribution, but real recommender systems show heavy-tailed frequency patterns. The question explores whether collisions actually concentrate damage on the high-traffic entities that matter most.
grounds: production scale that makes embedding-table problems first-order — 79% of inference cycles makes any degradation costly
Why do hash collisions hurt recommendation models so much? Explores whether standard low-collision hashing works for embedding tables in recommenders, given that user and item frequencies follow power-law distributions rather than uniform ones.
grounds: same scale problem from infrastructure angle
How do feed ranking weights shape what content gets produced? Feed-ranking weights are typically treated as neutral tuning parameters, but do they actually function as political levers that reshape producer behavior and the content supply itself?
extends: the scale of production recommendation makes the political consequences of weight-choice population-wide
Can small language models handle most agent tasks? Explores whether smaller, cheaper models are actually sufficient for the repetitive, scoped work that dominates deployed agent systems, rather than relying on large models by default.
complements: same compute-economics argument — SLM-first for agentic, three-class-DNN for recommendation — both refuse foundation-model defaults at production scale

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

personalized recommendation models drive 79 percent of Facebook AI inference cycles — three model classes consume two-thirds of total compute

What dominates AI compute in production systems today?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4