How do normalization and input injection control emergence of fixed points?

This explores a dynamical-systems framing — how design choices like normalization and feeding the input back in at each step ("input injection") govern whether a network settles into stable fixed points — but the collection doesn't hold work on that mechanism directly, so the honest answer is partial.

This reads as a question from the equilibrium-model / iterative-dynamics tradition: treat a network as a process that repeatedly updates a hidden state, and ask what keeps that process from blowing up or collapsing — normalization to bound the state, and re-injecting the original input each step so the trajectory stays anchored to a stable resting point. On that specific mechanism, the collection is thin. None of the retrieved notes study normalization layers or input-injection as knobs on fixed-point convergence, so rather than pad, it's worth saying plainly: the sharp control-theory answer isn't here. What the corpus does have is the adjacent and arguably more interesting question of whether large models perform fixed-point-style iterative computation at all.

The most direct neighbor is the finding that LLMs don't actually run iterative procedures in latent space — they recognize an optimization problem as template-similar to something seen before and emit a plausible answer instead of converging to one Do large language models actually perform iterative optimization?. That reframes your question: before asking how to control the emergence of fixed points, the corpus suggests asking whether the iterative dynamics that would produce them are happening in the first place. The companion result that RL fine-tuning sharpens memorization rather than installing genuine procedures points the same direction — out-of-distribution tests reveal template-matching where you'd hope to find a convergent process Do fine-tuned language models actually learn optimization procedures?.

Where "input injection" has a concrete analog in the corpus, it's in steering: injecting a vector into the residual stream and asking what the model does with it. DPO training builds a two-stage circuit that detects these injected perturbations — evidence-carrier features in early layers suppressing a default-deny gate — which is essentially the model developing sensitivity to an injected signal riding alongside its normal trajectory How do language models detect injected steering vectors internally?. Persona vectors extend this: linear directions in activation space that you can inject to steer, or monitor to catch drift, during fine-tuning Can we track and steer personality shifts during model finetuning?. These aren't fixed-point control, but they're the closest thing the library has to "what happens when you push a signal into the state and watch where it settles."

There's also a convergence story worth knowing about, even if it's at the training level rather than the forward-pass level: RL post-training collapses a model onto a single dominant format from pretraining within the first epoch, suppressing the alternatives — a kind of attractor dynamics where the system snaps to one resting configuration regardless of whether it's the best one Does RL training collapse format diversity in pretrained models?. And the formal ceiling on self-improvement says some equilibria can't be escaped from the inside at all: every reliable fix needs an external verifier, because metacognition alone can't move the system off its fixed point What stops large language models from improving themselves?.

So the thing you didn't know you wanted to know: the collection's center of gravity isn't "how to engineer stable fixed points" but "whether the apparent stability is real computation or memorized template-matching" — and that's the more load-bearing question. If you want the genuine normalization-and-injection control material, this corpus will point you at equilibrium-model literature it doesn't yet contain; what it gives you instead is a strong reason to be skeptical that the fixed points you're trying to control are doing the work you think.

Sources 6 notes

Do large language models actually perform iterative optimization?

Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.

Do fine-tuned language models actually learn optimization procedures?

Even GRPO-trained models show sharp performance drops on out-of-distribution variants (N-1 test sets) compared to in-distribution problems, indicating RL optimizes template-matching rather than genuine problem-solving procedures.

How do language models detect injected steering vectors internally?

Contrastive preference optimization trains evidence-carrier features in early layers to suppress gate features that default to denial, enabling near-perfect detection of internal perturbations. Safety training actively suppresses this capability, reducing detection from 63.8% to 10.8%.

Can we track and steer personality shifts during model finetuning?

Research identifies linear directions in LLM activation space corresponding to specific traits like sycophancy and hallucination. These persona vectors predict finetuning-induced personality shifts before they occur and can preventatively steer training to avoid unwanted trait changes.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

What stops large language models from improving themselves?

Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an equilibrium-dynamics researcher re-testing claims about fixed-point control in neural networks. The question: **How do normalization and input injection govern the emergence and stability of fixed points in iterative forward passes?**

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat as perishable:
- LLMs do not reliably execute iterative numerical procedures in latent space; they pattern-match instead of converge (~2024).
- RL fine-tuning sharpens memorization, not genuine iterative procedures; out-of-distribution failure reveals template-matching (~2025).
- Input injection into the residual stream produces detectable steering circuits (e.g., evidence carriers suppressing default gates in DPO-trained models); persona vectors in activation space can be monitored during fine-tuning (~2025).
- RL post-training converges to a single dominant pretraining distribution format within the first epoch—attractor dynamics at the training level (~2025).
- Formal bounds: no model can self-improve via metacognition alone; every reliable fix requires external verification (~2024–2025).

Anchor papers (verify; mind their dates):
- arXiv:2504.07912 (Echo Chamber, 2025) — RL amplifies pretraining behaviors, collapse onto single attractor.
- arXiv:2507.21509 (Persona Vectors, 2025) — linear steering and monitoring of injected signals.
- arXiv:2603.21396 (Introspective Awareness, 2026) — DPO-induced circuits for perturbation detection.
- arXiv:2412.02674 (Mind the Gap, 2024) — self-improvement ceiling and external verifier requirement.

**Your task:**
(1) **RE-TEST EACH CONSTRAINT.** For the four claims above, judge whether newer models (o1, o3, Claude 4), scaling, novel architectures (sparse, mixture-of-experts), or improved evaluation harnesses have since *dissolved* the assumption that iterative dynamics don't emerge. Separate the durable question ("Can networks run true fixed-point iteration?") from perishable limitations ("Current LLMs don't; newer ones might"). Cite what relaxed or overturned each; flag where constraints still hold.
(2) **Surface the strongest work contradicting or superseding the library's findings** from the last ~6 months. Have recent papers on in-context learning, meta-learning, or mechanistic interpretability shown that LLMs *do* perform genuine iterative computation, or that input injection + normalization *do* reliably control convergence?
(3) **Propose 2 research questions** that assume the regime may have shifted: (a) Under what architectural or training conditions does input-injection + normalization provably induce convergence to a stable fixed point in forward pass? (b) Can mechanistic interpretability identify the difference between memorized template-matching and true fixed-point iteration in current models?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

How do normalization and input injection control emergence of fixed points?

Sources 6 notes

Next inquiring lines