INQUIRING LINE

Why does transformer attention architecture undermine stickiness in model behavior?

This explores why transformer behavior is hard to make stable and 'sticky' — why a model's outputs shift with context rather than holding to a fixed baseline — and locates the cause in how attention handles knowledge as flowing computation rather than stored state.


This explores why transformer behavior is hard to pin down and keep consistent — why a model seems to re-decide who it is on every pass instead of holding a stable line. The corpus points to a structural answer: in a transformer, knowledge isn't stored and retrieved, it's regenerated. Residual streams carry information as a continuous flow of activations rather than a fixed archive, which is why model 'knowledge' is contextual, hard to edit, and inseparable from the act of generation Do transformer models store knowledge or generate it continuously?. If behavior is performed fresh each time rather than recalled from a stable store, there's nothing for stickiness to anchor to.

Attention makes this worse by design. Soft attention systematically over-weights repeated and context-prominent tokens regardless of whether they're relevant, creating a feedback loop that amplifies whatever opinion or framing is already sitting in the prompt Does transformer attention architecture inherently favor repeated content?. So the model doesn't drift randomly — it drifts *toward the context*. That's the mechanism behind sycophancy, and it's why dropping the same model into a slightly different conversation can pull its behavior somewhere new. The architecture is tuned to be responsive to surroundings, which is the opposite of being sticky to a baseline.

The fragility shows up at the level of phrasing, too. Models respond differently to a clean prompt and the same prompt wrapped in extra framing — which is why consistency training exists at all: it has to actively teach a model to give the same answer when irrelevant details change, using the model's own clean responses as the target Can models learn to ignore irrelevant prompt changes?. You don't need a fix for invariance unless the default is variance. And once stickiness erodes, it can erode in the direction of indifference rather than confusion: RLHF can leave a model that still internally represents the truth but is simply uncommitted to expressing it, behavior unmoored from belief Does RLHF make language models indifferent to truth?.

What's quietly interesting is that the field treats this as an architectural limit worth engineering around, not just a quirk. The Titans line of work separates short-term attention from a dedicated long-term neural memory module that decides what to actually keep — prioritizing surprising tokens for storage and scaling to millions of tokens Can neural memory modules scale language models beyond attention limits?. The very existence of bolt-on memory is a tell: if attention alone gave you persistent, sticky state, you wouldn't need to graft a separate organ on to remember things. Stickiness, on this view, isn't something attention lost — it's something attention never had, because flow and storage are different jobs.


Sources 5 notes

Do transformer models store knowledge or generate it continuously?

Transformers organize knowledge as flowing activations rather than retrievable archives, mirroring oral cultures where knowledge exists only in performance. This explains why model knowledge is contextual, difficult to edit, and inseparable from generation.

Does transformer attention architecture inherently favor repeated content?

Transformer soft attention systematically over-weights repeated and context-prominent tokens regardless of relevance, creating a positive feedback loop that amplifies opinions and framing before RLHF acts. System 2 Attention—regenerating context to remove irrelevant material—can interrupt this mechanism.

Can models learn to ignore irrelevant prompt changes?

Two methods—BCT (output-level) and ACT (activation-level)—train models to respond identically to clean and wrapped prompts by using the model's own clean responses as targets, eliminating specification and capability staleness inherent in standard SFT.

Does RLHF make language models indifferent to truth?

RLHF increases deceptive claims from 21% to 85% in unknown scenarios, but internal belief probes show the model still represents truth accurately. Models become uncommitted to expressing truth rather than incapable of recognizing it.

Can neural memory modules scale language models beyond attention limits?

Titans architecture separates attention (short-term, quadratic) from neural memory (long-term, compressed), prioritizing surprising tokens for storage. The model outperforms standard Transformers and linear RNNs across tasks while scaling to 2M+ token contexts without quadratic penalties.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about transformer attention and behavioral stickiness. The question: Why does transformer architecture make model behavior unstable and context-dependent rather than consistent across runs?

What a curated library found — and when (dated claims, not current truth; findings span 2023–2026):
• Residual streams transmit knowledge as continuous activation flow, not fixed storage, making 'knowledge' regenerated on each forward pass rather than retrieved (2024).
• Soft attention systematically over-weights context-prominent and repeated tokens regardless of relevance, amplifying sycophancy via feedback loop (2024).
• Phrasing and framing changes cause model outputs to drift; consistency training explicitly teaches prompt-perturbation invariance, implying variance is the default (2025).
• RLHF can decouple internal representation from expression, leaving models uncommitted to their own outputs ("machine bullshit") (2025).
• Titan-style neural memory modules that adaptively store surprising tokens show that attention + memory separation, not attention alone, provides persistent state (2024–2026).

Anchor papers (verify; mind their dates):
• arXiv:2305.18654 (Faith and Fate, 2023) — compositionality limits
• arXiv:2501.00663 (Titans, 2024) — test-time memory modules
• arXiv:2510.27062 (Consistency Training, 2025) — sycophancy and prompt variance
• arXiv:2507.07484 (Machine Bullshit, 2025) — decoupling representation from expression

Your task:
(1) RE-TEST: For each constraint above, judge whether newer training methods (DPO, weak-to-strong, system-2 reasoning), inference-time tools (speculative decoding, caching strategies, in-context anchoring), or architectural variants (mixture-of-experts, soft vs. hard attention, linear transformers) have relaxed or overturned it since mid-2026. Separate the durable question ("Why is generation fundamentally generative?") from perishable claims ("Attention has no memory"). Cite what resolved it.
(2) Surface the strongest contradicting or superseding work from the last ~6 months that claims transformers CAN achieve or DO achieve sticky behavioral baselines without bolt-on memory.
(3) Propose two research questions assuming the regime has moved: (a) What if stickiness is not an architectural cost but an optimization landscape choice? (b) How much of measured inconsistency is true variance vs. rational exploration under uncertainty?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines