How do sparse circuits compare to the modular subnetworks that emerge naturally?
This explores the contrast between two routes to modularity inside neural networks: circuits you *force* into being by training with sparse weights, versus modular subnetworks that the network *grows on its own* when learning compositional tasks.
This explores the contrast between two routes to modularity inside neural networks: circuits you *force* into being by training with sparse weights, versus the modular subnetworks a network grows on its own. The corpus turns out to have both poles, and they rhyme more than you'd expect. On the emergent side, pruning experiments show that networks trained normally already carve compositional tasks into isolated subroutines — ablate one and only its corresponding function breaks, and pretraining makes this self-organized structure more consistent across architectures Do neural networks naturally learn modular compositional structure?. On the engineered side, training a transformer with sparse weights produces compact, human-readable circuits where individual neurons map to simple concepts, and ablation confirms each circuit is both necessary and sufficient for its task Can sparse weight training make neural networks interpretable by design?.
The interesting comparison isn't 'which exists' — both do — it's what each buys you and what it costs. Emergent modularity is free (it comes along with ordinary training) but it's *implicit*: the boundaries are real but you have to go looking for them with pruning, and there's no guarantee they're clean or stable. Forced sparsity is *legible by construction* — you get disentangled circuits you can actually read — but it doesn't scale yet, breaking down past tens of millions of parameters. So the trade is roughly: nature gives you modularity cheaply but messily; sparsity gives you modularity cleanly but expensively and only at small scale.
Here's the thing the corpus suggests you didn't know to ask: sparsity inside a network isn't a single phenomenon, and not all of it is structural in the circuit sense. Networks default to *sparse* activations for unfamiliar inputs and *dense* ones for well-learned data, so sparsity is partly a learned signature of familiarity rather than a designed property Is representational sparsity learned or intrinsic to neural networks?. And under hard, out-of-distribution tasks, hidden states sparsify in a localized way that acts as a stabilizing filter, not a failure Do language models sparsify their activations under difficult tasks?. That means when you train for sparse circuits, you may be leaning into a behavior the network already uses for its own purposes — you're formalizing a tendency, not imposing a foreign one.
The sharpest caution comes from work on internal structure: identical task performance can hide radically different internal organization, and a model can hold all the linearly-decodable features it needs while its actual representations are fractured and fragile to perturbation Can models be smart without organized internal structure? What actually happens inside a language model?. This is exactly why 'emergent modularity' deserves skepticism: a subnetwork that looks modular under one probe may be brittle underneath. Forced-sparse circuits are an attempt to *guarantee* the structure is real rather than hoping the network found a good one on its own.
If you want to widen the lens, modularity also shows up at the architecture level rather than the weight level — separating a 'decomposer' from a 'solver' improves accuracy and lets the decomposition skill transfer across domains, a deliberate version of the division of labor that sparse circuits discover at the neuron level Does separating planning from execution improve reasoning accuracy?. And sparsity-as-design appears again in mixture-of-experts work, where combining lookup memory with sparse expert routing beats either alone Can lookup memory and computation work together better than either alone?. Across all of these, the recurring lesson is that modularity is something you can either *wait for* or *insist on* — and insisting on it is the only way to be sure the clean structure you see is the structure that's actually doing the work.
Sources 8 notes
Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.
Training transformers with sparse weights creates compact, human-interpretable circuits where neurons correspond to simple concepts with clear connections. Ablation studies confirm these circuits are necessary and sufficient for task performance, though scaling beyond tens of millions of parameters while maintaining interpretability remains unsolved.
During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.
As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.
Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.
Research shows that LLMs can achieve the same output through different internal mechanisms, and improvements in one dimension like accuracy reliably degrade others like faithfulness and calibration. Internal structure matters even when behavior appears identical.
Modular architectures with separate decomposer and solver models outperform monolithic LLMs, with decomposition ability transferring across domains while solving ability does not. The separation prevents planning-execution interference and produces more generalizable skills.
Engram combines O(1) N-gram lookup with Mixture-of-Experts routing, revealing a U-shaped scaling law where balanced allocation to both mechanisms outperforms either alone. Gains appear largest in reasoning and code rather than pure retrieval.