What role does attention structure play in creating position bias?
This explores whether the transformer's attention mechanism itself — not the training data or fine-tuning — is what makes models care where information sits in a prompt.
This explores whether the transformer's attention mechanism itself is what makes models sensitive to where information appears in a prompt, rather than that bias being learned from data later. The corpus points fairly clearly at the architecture. Transformer soft attention is structurally biased toward tokens that are repeated or contextually prominent, over-weighting them regardless of whether they're actually relevant Does transformer attention architecture inherently favor repeated content?. That's a mechanism baked into how attention distributes weight — a positive feedback loop that amplifies whatever is salient before any reward-based training gets a vote. So position effects aren't a quirk of one model's tuning; they ride on the same structural tendency.
The sharpest demonstration that position alone matters: moving an identical block of in-context examples from the start of a prompt to the end can swing accuracy by up to 20% and flip nearly half the model's predictions — with the content held constant How much does demo position alone affect in-context learning accuracy?. Same words, different slot, different answer. That's position bias in its purest form, and it survives across task types, which is what you'd expect if the cause is architectural rather than topical.
There's a revealing counterpoint in how attention actually retrieves facts from long context. Fewer than 5% of attention heads do the real work of pulling specific information out of a long prompt; prune them and the model hallucinates even though the answer was sitting right there What mechanism enables models to retrieve from long context?. So retrieval depends on a thin, specialized sliver of the attention apparatus — which helps explain why information in an awkward position can effectively go unread: it's not that the data is absent, it's that the structural machinery for surfacing it is sparse and unevenly triggered by where things sit.
What's striking is where the bias originates versus where it can be fixed. Cognitive biases in these models are planted during pretraining and only nudged — not created — by fine-tuning Where do cognitive biases in language models come from?, consistent with position bias being a property of the base architecture. But it isn't destiny. Regenerating the context to strip irrelevant material can interrupt the over-weighting loop Does transformer attention architecture inherently favor repeated content?; training judges to reason through an evaluation rather than react to surface features directly cuts position bias along with verbosity and authority bias Can reasoning during evaluation reduce judgment bias in LLM judges?; and consistency training can teach a model to respond the same way regardless of how a prompt is wrapped or arranged Can models learn to ignore irrelevant prompt changes?.
The thing you might not have expected: position bias and sycophancy are partly the same bug. Both fall out of attention over-weighting whatever is prominent — a repeated opinion, or a demo in a favored slot. Fixing one tends to be the same kind of intervention as fixing the other, because you're fighting the same structural tilt rather than two separate flaws.
Sources 6 notes
Transformer soft attention systematically over-weights repeated and context-prominent tokens regardless of relevance, creating a positive feedback loop that amplifies opinions and framing before RLHF acts. System 2 Attention—regenerating context to remove irrelevant material—can interrupt this mechanism.
Repositioning an identical demo block from prompt start to end swaps up to 20% accuracy and flips nearly half of predictions. This spatial effect operates independently of demo content and spans multiple task types.
Less than 5% of attention heads across all model families function as retrieval heads, are intrinsic to short-context models, dynamically activate by context, and are causally necessary for factuality. Pruning them causes hallucination despite information being present in context.
A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.
Training judges with reinforcement learning to reason about evaluations—by converting judgment tasks into verifiable problems with synthetic data pairs—produces judges that think through their decisions rather than relying on exploitable surface features, directly mitigating authority, verbosity, position, and beauty bias.
Two methods—BCT (output-level) and ACT (activation-level)—train models to respond identically to clean and wrapped prompts by using the model's own clean responses as targets, eliminating specification and capability staleness inherent in standard SFT.