SYNTHESIS NOTE

Do hidden massive activations act as attention bias terms?

Explores whether a tiny handful of unusually large activations in LLMs function as structural bias terms that shape attention patterns, regardless of input content.

Synthesis note · 2026-06-03 · sourced from MechInterp

Most LLM study focuses on external behavior; this work looks inside and finds a surprising internal phenomenon — massive activations: a very small number of activations with values up to ~100,000× larger than the rest. They are widespread across model sizes and families, and they have three load-bearing properties. Their values stay largely constant regardless of input — so they function as indispensable implicit bias terms rather than carriers of input-specific information. And they concentrate attention probability onto their corresponding tokens, producing an implicit bias in the self-attention output. The same phenomenon appears in Vision Transformers.

The keeper is mechanistic: a tiny number of constant, input-agnostic activations are doing structural work — implementing a bias the architecture needs — and they are the substrate of the "attention sink" behavior where attention piles onto a few tokens. Pruning or quantizing naively can destroy them and break the model, which is why they matter for compression and interpretability.

This connects the vault's attention-mechanism thread. It is the activation-level companion to Does transformer attention architecture inherently favor repeated content? — both locate structural attention biases below the training layer — and it explains a failure mode for aggressive quantization like Can ternary weights match full precision model performance?, where preserving these rare massive values is essential.

Inquiring lines that use this note as a source 12

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 114 in 2-hop network ·medium cluster Open in graph ↗

Do hidden massive activations act as attention b… Does transformer attention architecture inherently… Can ternary weights match full precision model per… Do language models sparsify their activations unde…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does transformer attention architecture inherently favor repeated content? Explores whether soft attention's tendency to over-weight repeated and prominent tokens explains sycophancy independent of training. Questions whether architectural bias precedes and enables RLHF effects.
activation-level companion to that attention-bias finding
Can ternary weights match full precision model performance? Can models trained natively with only three weight values (−1, 0, 1) achieve the same perplexity and task performance as standard full-precision models? This matters because ternary weights could dramatically reduce computational and energy costs.
rare massive values are exactly what aggressive quantization must preserve
Do language models sparsify their activations under difficult tasks? When LLMs encounter unfamiliar or difficult inputs, do their internal representations become sparser rather than denser? Understanding this adaptive response could reveal how models stabilize reasoning under uncertainty.
both probe the structure of LLM internal activations rather than outputs

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

a handful of input-agnostic massive activations function as implicit attention-bias terms in LLMs

Do hidden massive activations act as attention bias terms?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4