SYNTHESIS NOTE
Model Architecture and Internals

Can a single regularizer prevent JEPA representation collapse?

JEPAs traditionally need complex loss stacks and auxiliary tricks to avoid collapse. Can a single Gaussian-distribution constraint on latent embeddings do the same stabilization work, and would that simplify training?

Synthesis note · 2026-06-03 · sourced from Cognitive Models Latent

Joint-Embedding Predictive Architectures (JEPAs) learn world models in compact latent spaces, but existing methods are fragile — they rely on complex multi-term losses, exponential moving averages, pretrained encoders, or auxiliary supervision to avoid representation collapse (the degenerate solution where the encoder maps everything to a constant). The engineering needed to keep them stable is itself the barrier.

LeWorldModel (LeWM) is the first JEPA that trains stably end-to-end from raw pixels using only two loss terms: a next-embedding prediction loss and a regularizer enforcing Gaussian-distributed latent embeddings. That single regularizer does the anti-collapse work that the usual stack of tricks did, cutting tunable loss hyperparameters from six to one. The payoff is practical: 15M parameters trainable on one GPU in hours, planning up to 48× faster than foundation-model-based world models, competitive across 2D and 3D control. The latent space also encodes meaningful physical structure — probing recovers physical quantities, and the model reliably flags physically implausible events.

The general lesson is that simplicity in the self-supervised objective can replace brittle engineering: collapse is prevented by an explicit distributional constraint rather than by carefully balanced auxiliary terms. This is the practical face of Why is predicting latents more sample-efficient than tokens? — the theory says latent prediction is the efficient target; LeWM shows the missing piece was a principled way to keep those latents non-degenerate.

Inquiring lines that use this note as a source 3

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 107 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

a single Gaussian-latent regularizer prevents JEPA representation collapse replacing the fragile stack of EMAs stop-gradients and auxiliary losses