Why should deep learning theory prioritize average-case over worst-case analysis?

This explores why deep learning theory is shifting toward predicting how models typically behave (average-case, statistical) rather than proving guarantees about the rarest failure (worst-case bounds), and what that shift buys you.

This explores why deep learning theory is moving away from worst-case guarantees toward average-case, statistics-driven prediction — and the corpus frames this less as a concession than as a deliberate change of physics. The clearest articulation is the emergence of "learning mechanics" as a unifying frame: it models networks the way classical and statistical mechanics model gases, caring about aggregate behavior and training dynamics rather than the pathological single particle Can deep learning theory unify around training dynamics?. Worst-case bounds in deep learning are notoriously loose and pessimistic — they describe adversarial corners the model almost never visits in practice — so a theory built on them predicts little about the models people actually train and deploy.

What makes the average-case approach earn its keep is that the field's most useful recent results are empirical regularities, not guarantees. You can predict where an LLM will struggle by treating it as a probability machine and asking which targets sit in low-probability regions — backwards alphabet, letter counting — a typical-case prediction that worst-case analysis would never surface because logically those tasks are trivial Can we predict where language models will fail?. Similarly, the finding that depth beats width below a billion parameters Does depth matter more than width for tiny language models?, or that RL post-training reliably collapses onto one dominant pretraining format within a single epoch Does RL training collapse format diversity in pretrained models?, are statements about what training dynamics *tend* to do — the kind of aggregate regularity a mechanics-style theory is built to explain.

The deeper payoff is that average-case thinking redirects attention to *structure* over capacity. Several notes show that the interesting story lives in how representations organize themselves during typical training, not in capacity bounds: networks spontaneously sink compositional subroutines into isolated subnetworks Do neural networks naturally learn modular compositional structure?, activations grow dense for familiar data and stay sparse for the unfamiliar Is representational sparsity learned or intrinsic to neural networks?, and even a humble linear model can beat deep collaborative filtering when the right structural constraint is imposed Can a linear model beat deep collaborative filtering?. These are average-case structural facts that worst-case capacity analysis is blind to.

Here's the twist worth carrying away: average-case can be *too* forgiving if you only look at output. Two networks can produce identical outputs while one carries clean, transferable structure and the other carries a fractured, entangled mess that breaks the moment you push it toward novel contexts Can identical outputs hide broken internal representations?. So the real argument isn't "average-case instead of worst-case" — it's that the right unit of analysis is *distributional and dynamical*: what the model does across the data it actually meets and the trajectory it actually takes through training. That's also why pushing into the extreme tail backfires in practice — training on near-impossible RLVR problems teaches degenerate shortcuts rather than reasoning Do overly hard RLVR samples actually harm model capabilities?. The worst case isn't just hard to bound; chasing it can actively corrupt the typical case.

Sources 9 notes

Can deep learning theory unify around training dynamics?

Research shows learning mechanics is consolidating as a unified frame for deep learning, modeled on classical and statistical mechanics. It prioritizes average-case predictions, training dynamics, and aggregate statistics over worst-case bounds, mirroring how physics addresses macroscopic systems.

Can we predict where language models will fail?

By framing LLMs as autoregressive probability machines, researchers predicted tasks with low-probability target responses would be systematically harder, even when logically simple. Experiments confirmed predictions like backwards alphabet and letter counting.

Does depth matter more than width for tiny language models?

MobileLLM shows deep-and-thin architectures yield 2.7–4.3% accuracy gains over balanced designs at 125M–350M scale by composing abstract concepts through layers rather than spreading parameters across width.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Is representational sparsity learned or intrinsic to neural networks?

During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.

Can a linear model beat deep collaborative filtering?

ESLER, a single-layer linear autoencoder constrained so items cannot predict themselves, outperforms most deep CF models. The constraint forces prediction through item relationships, and negative weights encoding anti-affinity prove essential—structural bias matters more than model capacity.

Can identical outputs hide broken internal representations?

Networks trained with SGD reproduce outputs perfectly while having radically different internal structure than evolved networks, with weight perturbations revealing fractured, entangled representations that prevent transfer to novel contexts or creative recombination.

Do overly hard RLVR samples actually harm model capabilities?

Training on nearly-impossible problems causes models to learn degenerate shortcuts rather than genuine reasoning, and these shortcuts contaminate pre-existing capabilities. Group-relative normalization treats rare accidental successes as high-advantage trajectories, reinforcing answer repetition and computation-skipping instead of sound reasoning patterns.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a deep learning theorist revisiting the average-case vs. worst-case debate. The question remains: *why should theory prioritize average-case over worst-case analysis?* — and has the answer shifted?

What a curated library found — and when (dated claims, not current truth): Findings span 2019–2026.
• Learning mechanics frames deep learning like statistical mechanics: aggregate behavior over pathological corners matters more (2026).
• Worst-case bounds are too loose to predict where models actually fail; typical-case analysis (e.g., low-probability regions) surfaces real failure modes (~2024–2025).
• Structural organization emerges reliably during typical training: modular decomposition, representational density correlated with data familiarity, sparse representations for OOD inputs (2023–2026).
• Identical output performance can mask representational fragility; average-case output metrics hide entangled, non-transferable structure (2025).
• Training on extreme RL difficulty induces degenerate shortcuts rather than robust reasoning, suggesting worst-case chasing can corrupt typical-case learning (2026).

Anchor papers (verify; mind their dates):
• arXiv:2301.10884 (2023) — Structural Compositionality in Neural Networks
• arXiv:2504.07912 (2025) — RL Post-training Amplifies Pretraining Behaviors
• arXiv:2505.11581 (2025) — Fractured Entangled Representations
• arXiv:2605.28388 (2026) — Sample Difficulty and RLVR Mechanisms

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For every structural claim (modularity, density, sparsity), check whether recent mechanistic interpretability work (circuits, sparse autoencoders, causality methods) has *confirmed* or *refuted* the average-case predictions. Separately: has the gap between output-level and representation-level metrics widened or narrowed? Does it still hold that worst-case training corrupts typical performance, or have hard-example curricula improved robustness since 2026?
(2) **Surface contradicting work.** Hunt for papers arguing worst-case analysis *remains* necessary—either because recent scaling has made edge cases more common, or because recent interpretability has revealed that "typical" structure is illusory. Flag any claim that average-case metrics can be *deceptively* optimistic.
(3) **Propose 2 research questions assuming the regime may have moved:** (a) If mechanistic circuits are the true unit of analysis, does average-case theory of *circuits* (not outputs) close the gap to worst-case guarantees? (b) Do multimodal or reasoning-focused models show *different* structural regularities, forcing a revision of what "typical" means?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Why should deep learning theory prioritize average-case over worst-case analysis?

Sources 9 notes

Next inquiring lines