How do encode-decode contractive biases create stable attractors in latent space?
This explores how an autoencoder's repeated encode-decode loop quietly turns into a dynamical system — where ordinary training pressures (not deliberate design) pull points toward fixed destinations in latent space.
This explores how an autoencoder's repeated encode-decode loop quietly turns into a dynamical system — where ordinary training pressures, not deliberate design, pull points toward fixed destinations in latent space. The cleanest answer in the corpus is that if you take a trained autoencoder and iterate its encode-decode map — feed the output back in, again and again — the trajectories converge to fixed points, and those fixed points are attractors that nobody explicitly built Do autoencoders learn hidden attractors in latent space?. The 'contractive bias' isn't a special loss term; it falls out of mundane choices like weight decay, initialization, and data augmentation, all of which gently shrink the space so nearby inputs flow to shared basins. The model behaves like a vector field even though it was only ever trained to reconstruct.
The interesting part is what determines *where* those attractors sit. The same note ties their character to the memorization-versus-generalization spectrum: an overfit model carves narrow basins around stored examples, while a generalizing one settles into smoother, broader ones. That maps neatly onto a separate finding that representational density itself is *learned* — networks build dense, confident activations for familiar data and fall back to sparse ones for unfamiliar inputs, purely through exposure during pretraining Is representational sparsity learned or intrinsic to neural networks?. Read together, an attractor is less a geometric accident than a fossil of what the model saw a lot of: familiarity sculpts the wells, and the iterated map just rolls downhill into them.
This connects to a surprising fact about what latent space looks like when you probe it. Networks don't store structure as a featureless blob — they spontaneously organize it. LLM activations encode syntactic type and direction in something like polar coordinates, using both distance and angle, without ever being told to How do language models encode syntactic relations geometrically?. The lesson that travels back to autoencoders: contractive training doesn't just compress, it imposes geometry, and stable attractors are one signature of that self-organized structure. Stability and structure are two faces of the same learned latent field.
There's also a stability-as-adaptation thread worth following. When LLMs hit out-of-distribution inputs, their hidden states *sparsify* — and this acts as a selective filter that holds performance together rather than a breakdown Do language models sparsify their activations under difficult tasks?. So 'stable attractor' and 'adaptive collapse to a sparse code' may be describing the same instinct from different angles: the network defaulting to a safe, low-dimensional resting state when it's unsure. Attractors aren't only memory wells; they can be where a model goes to stay reliable.
For a cross-domain mirror, look at reinforcement learning. RL post-training reliably collapses a model's many pretraining output formats down to one dominant format within the first epoch — an attractor in behavior space rather than latent space, where the winner is set by model scale, not quality Does RL training collapse format diversity in pretrained models?. The shared shape is convergence-without-intent: a training process, optimizing for something else entirely, quietly funnels a system toward a small set of stable outcomes. If you want a constructive counterpoint — treating the latent dimension as something you scale and steer rather than something that collapses on you — latent-thought models couple fast local and slow global learning to make latent size its own scaling axis Can latent thought vectors scale language models beyond parameters?.
Sources 6 notes
Iterating an autoencoder's encode-decode map reveals convergent trajectories with attractor points that emerge from training-induced contractive biases. These attractors arise naturally from initialization schemes, weight decay, and data augmentation—without explicit design—and their nature reflects the memorization-versus-generalization spectrum of the training regime.
During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.
The Polar Probe shows LLMs represent syntactic type and direction through both distance and angular position between embeddings, nearly doubling accuracy over distance-only methods. This demonstrates neural networks spontaneously learn structured, symbolic-compatible geometry.
As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.
Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.
Latent-Thought Language Models achieve superior sample and parameter efficiency by coupling fast local variational learning with slow global decoder learning. This dual-rate scheme scales few-shot reasoning across both model and latent size, creating independent scaling dimensions beyond traditional parameter scaling.