How do overparameterization and data size shift what attractors represent?

This explores how a model's capacity (overparameterization) and how much data it sees change the meaning of the stable points that learned dynamics settle into — whether those attractors encode memorized examples or genuine generalization.

This explores attractors in the sense of Do autoencoders learn hidden attractors in latent space?: if you iterate an autoencoder's encode-decode map, trajectories converge toward fixed points, and the striking claim is that *what those points represent* isn't designed — it falls out of the training regime, sitting somewhere on a spectrum between memorizing specific inputs and generalizing across them. So the real question is what slides a model along that spectrum. Two levers show up across the corpus: how much capacity the model has relative to the data, and how familiar the data is.

The capacity lever has a surprisingly sharp number attached to it. When do language models stop memorizing and start generalizing? finds that models memorize at roughly 3.6 bits per parameter, and only once that capacity is *full* does a phase transition — grokking — flip them from storing examples to generalizing. Read alongside the autoencoder result, this reframes overparameterization: a model with plenty of spare capacity relative to its data has no pressure to abstract, so its attractors can afford to be memories — basins centered on training points. Shrink the parameter budget or grow the dataset past the memorization ceiling, and the same attractors are forced to stand for shared structure instead of individual examples. Data size and parameter count aren't two separate knobs here; they're the two sides of one ratio that decides what an attractor can possibly mean.

Data familiarity sharpens the picture further. Is representational sparsity learned or intrinsic to neural networks? shows networks grow dense activations for inputs they've seen a lot of and default to sparse ones for unfamiliar inputs — meaning the geometry attractors live in is itself shaped by exposure. Do language models sparsify their activations under difficult tasks? adds the dynamic flip side: faced with out-of-distribution inputs, hidden states sparsify as an adaptive filter rather than a failure. Together they suggest attractors aren't fixed furniture — they're dense, well-carved basins where the model has abundant data and shallow, improvised ones where it doesn't.

The cautionary thread is that you can't read any of this off performance. Can models be smart without organized internal structure? shows two models can hit identical accuracy while one has clean structure and the other a fractured internal organization that shatters under perturbation — the difference between a real generalizing attractor and a memorized one is invisible to the scoreboard. And Do standard analysis methods hide nonlinear features in neural networks? warns that our standard tools (PCA, RSA, linear probes) over-report tidy linear structure and miss the nonlinear part, so the attractors we *think* we see may be an artifact of the lens. The thing you didn't know you wanted to know: the line between an attractor that's a memory and one that's a concept is set by the capacity-to-data ratio, it flips abruptly rather than gradually, and our usual measurements are precisely the ones least equipped to tell the two apart.

Sources 6 notes

Do autoencoders learn hidden attractors in latent space?

Iterating an autoencoder's encode-decode map reveals convergent trajectories with attractor points that emerge from training-induced contractive biases. These attractors arise naturally from initialization schemes, weight decay, and data augmentation—without explicit design—and their nature reflects the memorization-versus-generalization spectrum of the training regime.

When do language models stop memorizing and start generalizing?

GPT-family models have a measurable memorization capacity of approximately 3.6 bits-per-parameter. When this capacity fills, a phase transition triggers grokking—the shift from memorization to genuine generalization. This capacity is a property of individual models, not training algorithms.

Is representational sparsity learned or intrinsic to neural networks?

During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.

Do language models sparsify their activations under difficult tasks?

As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.

Can models be smart without organized internal structure?

Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.

Do standard analysis methods hide nonlinear features in neural networks?

PCA, linear regression, and RSA over-represent simple linear features while under-representing equally important nonlinear features. Homomorphic encryption demonstrates that networks can compute perfectly well with no interpretable activation structure, proving representation patterns and computation can be entirely decoupled.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about how overparameterization and data size shift what neural network attractors represent — specifically, whether attractors are memorized inputs or generalized concepts.

What a curated library found — and when (dated claims, not current truth):
These findings span 2024–2026 and address the attractor landscape across autoencoders and language models:
• Models memorize at ~3.6 bits per parameter before grokking triggers a phase transition from storage to generalization (~2024–2025).
• The capacity-to-data ratio (not capacity or data alone) determines whether attractors settle on memorized points or shared structure; exceeding the ratio forces abstraction (~2024–2025).
• Networks learn dense activations for familiar training data but sparsify for out-of-distribution inputs as an adaptive filter, reshaping the geometry attractors inhabit (~2025–2026).
• Two models can achieve identical test accuracy while one has coherent internal structure and the other fractured organization; performance metrics do not reveal attractor type (~2025).
• Standard representation analysis tools (PCA, RSA, linear probes) systematically bias toward linear structure and miss nonlinear attractors (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2405.08366 (2024-05) — Sparse Autoencoders for Interpretability
• arXiv:2501.00663 (2024-12) — Titans: Learning to Memorize at Test Time
• arXiv:2505.11581 (2025-05) — Questioning Representational Optimism: Fractured Entangled Representations
• arXiv:2603.03415 (2026-03) — Farther the Shift, Sparser the Representation

Your task:
(1) RE-TEST EACH CONSTRAINT. For the 3.6-bits-per-parameter memorization ceiling, the grokking threshold, and the sparse-to-dense activation flip: has newer work (post-June 2025) shown these regimes dissolve under scaled models, better curricula, or novel architectures? Where do these phase transitions still hold? Separate the durable claim (capacity-ratio governs attractor type) from perishable specifics (the numerical threshold).
(2) Surface work from the last ~6 months that contradicts or supersedes the claim that performance metrics hide attractor structure—e.g., do recent evals, mechanistic probes, or multi-modal analysis now reliably separate memory from generalization?
(3) Propose 2 research questions that assume the regime may have shifted: (a) In post-training RL systems (arXiv:2504.07912, 2505.11711), does RL collapse or redistribute attractors in ways the capacity-ratio framework doesn't predict? (b) As models scale to billions of parameters, does the 3.6-bits-per-param bound become a bottleneck or obsolete?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

How do overparameterization and data size shift what attractors represent?

Sources 6 notes

Next inquiring lines