How does representational density emerge from training data familiarity?

This explores why neural networks build dense, rich internal representations for data they've seen often during training — and what that says about the difference between genuine learning and frequency-driven familiarity. The core finding is almost behavioral: during pretraining, a model fires dense activations for inputs it recognizes from its training diet and falls back to sparse, thin representations for anything unfamiliar — and this happens on its own, without any task-specific tuning Is representational sparsity learned or intrinsic to neural networks?. Density, in other words, isn't a fixed property of the architecture. It's a residue of exposure.

That reframes a lot of what looks like 'understanding' as something closer to frequency bookkeeping. When researchers traced multimodal models' supposed zero-shot generalization, they found performance tracks how often a concept actually appeared in pretraining — models need exponentially more data for linear gains, which means the impressive results are interpolation over familiar territory, not leaps into the new Does multimodal zero-shot performance actually generalize or interpolate?. The same logic shows up at the micro scale: whether a keyword gets 'primed' after a gradient update is predictable from its probability *before* learning, with a sharp threshold and as few as three exposures needed to lock the effect in Can we predict keyword priming before learning happens?. Familiarity isn't a vague feeling the model has — it's a measurable quantity that forecasts how it will represent and recall things.

But familiarity isn't one thing, and this is where the corpus gets interesting. There's a real split between facts and procedures. Factual recall depends on narrow, document-specific memorization — the model essentially needs to have seen *that fact* in *that document*. Reasoning, by contrast, draws on broad procedural knowledge distributed across many diverse sources, which is why it generalizes where memorized facts don't Does procedural knowledge drive reasoning more than factual retrieval?. So 'density from familiarity' has two flavors: a brittle, lookup-style density for memorized facts, and a more transferable density built from repeated exposure to *patterns of doing* rather than *items to retrieve*.

There's also a structural side to how this density organizes itself. Networks don't just accumulate a dense blur — pretraining sharpens compositional structure, carving tasks into modular subnetworks where ablating one piece only breaks its corresponding function, and this modularity gets more reliable the more pretraining a model has had Do neural networks naturally learn modular compositional structure?. Familiarity, then, doesn't only thicken representations; it also consolidates them into reusable parts. A complementary line of work suggests this happens far faster when learning targets latent structure rather than raw tokens, because same-level latents are far more correlated than surface tokens — so the model recovers hierarchy with samples that don't blow up with depth Why is predicting latents more sample-efficient than tokens?.

The quietly subversive payoff: if density is learned through exposure and base models already carry the structure, then much of what post-training 'adds' may just be elicitation of what familiarity already built. Several independent methods — RL steering, critique tuning, decoding tweaks, feature steering — all surface reasoning that was already latent in base activations, suggesting post-training selects rather than creates Do base models already contain hidden reasoning ability?. That also explains why heavy fine-tuning can backfire: directly rewriting weights corrupts the knowledge stored in lower layers, while decoding-time approaches that leave those familiar representations untouched preserve knowledge far better Can decoding-time tuning preserve knowledge better than weight fine-tuning?. The density built by familiarity is valuable precisely because it's fragile — worth eliciting, dangerous to overwrite.

Sources 8 notes

Is representational sparsity learned or intrinsic to neural networks?

During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.

Does multimodal zero-shot performance actually generalize or interpolate?

Across 34 models and 5 datasets, multimodal models require exponentially more pretraining data for linear performance gains on downstream tasks. Performance correlates with how often test concepts appeared during pretraining, not genuine generalization ability.

Can we predict keyword priming before learning happens?

Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Why is predicting latents more sample-efficient than tokens?

A formal sample-complexity analysis proves latent-level self-supervision (data2vec/JEPA style) recovers compositional structure with samples constant in hierarchy depth, while token-level learning requires exponential samples—because same-level latents are far more correlated than raw tokens.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.

How does representational density emerge from training data familiarity?

Sources 8 notes

Next inquiring lines