Do autoencoders learn hidden attractors in latent space?
When you repeatedly apply an autoencoder's encode-decode cycle, do the trajectories in latent space converge to specific points? If so, what creates these attractors and what do they reveal about what the network learned?
A trained autoencoder is usually treated as a one-shot map: input goes in, latent code comes out, reconstruction goes back. Navigating the Latent Space Dynamics of Neural Models reframes the same architecture as a dynamical system. Iterate the encode-decode map and you trace trajectories in latent space. The endpoints of those trajectories are attractor points — locations where iteration stops moving — and they emerge without any additional training, purely from the geometry the autoencoder learned.
The mechanism is locally contractive behavior near training examples. Three inductive biases combine to produce it. Initialization bias: standard schemes preserve activation variance and exhibit a global tendency toward contractive maps. Explicit regularization: weight decay penalizes parameter norms and encourages contraction. Implicit regularization: data augmentation introduces local perturbations around training examples, effectively defining neighborhoods the encoder learns to contract toward. None of these were designed to create attractors; the attractors are a side effect.
What attractors represent depends on training regime. Heavy overparameterization with limited data produces attractors that correspond to memorized examples — the autoencoder behaves like an associative memory akin to a Hopfield network. With more data and less overparameterization, attractors become more abstract — they represent learned distribution modes rather than individual training points. The position on this memorization-vs-generalization spectrum is itself a property of the inductive-bias regime.
This reframing turns the network into an object with intrinsic dynamics that can be analyzed without input data. You can sample noise, iterate the encode-decode map, and discover what the network has actually learned by tracing where the dynamics settle. For foundation models, this enables a class of probing methods that do not require access to the original training data — particularly useful when that data is proprietary or distributed.
Inquiring lines that use this note as a source 6
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What non-linear patterns do autoencoders discover that matrix factorization misses?
- Can autoencoders act as associative memory systems like Hopfield networks?
- How do overparameterization and data size shift what attractors represent?
- How do encode-decode contractive biases create stable attractors in latent space?
- What physical structure does a Gaussian-regularized latent space actually encode?
- What makes regularization an implicit factor in embedding geometry?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can we probe foundation models without any input data?
Can we understand what foundation models have learned by sampling noise through their encode-decode dynamics instead of analyzing their response to real inputs? This matters for auditing models whose training data is proprietary or inaccessible.
same paper, the methodology application this finding enables
-
Can identical outputs hide broken internal representations?
Can neural networks produce correct outputs while having fundamentally fractured internal structure that prevents generalization and creativity? This challenges our assumptions about what performance benchmarks actually measure.
adjacent: another angle on what internal structure carries
-
What happens inside models when they suddenly generalize?
Grokking appears as an abrupt shift from memorization to generalization. But is the underlying process truly discontinuous, or does mechanistic analysis reveal continuous phases we can measure and predict?
adjacent: another decomposition of the memorization-generalization spectrum
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Navigating the Latent Space Dynamics of Neural Models
- A Mechanistic Analysis of Looped Reasoning Language Models
- Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
- Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs
- The Vanishing Gradient Problem for Stiff Neural Differential Equations
- From Simulation to Enaction: Post-trained Language Models Recognize and React to their own Generations
- Hierarchical Reasoning Model
- Learn from your own latents and not from tokens: A sample-complexity theory
Original note title
autoencoders implicitly define a latent vector field via iterated encode-decode maps with attractors emerging from training-induced contractive bias