SYNTHESIS NOTE
Psychology, Society, and Alignment

Do language models learn differently from good versus bad outcomes?

Do LLMs update their beliefs asymmetrically when learning from their own choices versus observing others? This matters for understanding whether agentic AI systems might inherit human cognitive biases.

Synthesis note · 2026-02-23 · sourced from Cognitive Models Latent

Using instrumental learning tasks adapted from cognitive psychology (multi-armed bandit variants), LLMs show a systematic optimism bias: they learn more from better-than-expected outcomes than from worse-than-expected ones when learning about their own chosen actions. Three properties of this bias parallel human cognition precisely:

  1. Optimism for chosen actions — the model updates beliefs more strongly when outcomes exceed expectations than when they fall short
  2. Reversal for counterfactual feedback — when learning about the value of the unchosen option, the bias reverses (pessimism about alternatives)
  3. Disappearance without agency — when the model has no control over choices (passive observation), the asymmetry vanishes entirely

The meta-RL validation is critical: idealized in-context learning agents derived through meta-reinforcement learning — which converge onto Bayes-optimal strategies — exhibit the same three behavioral effects. This suggests the asymmetry may be rational rather than a bug. An optimistic agent that overweights positive outcomes from its own actions while underweighting positive outcomes from unchosen alternatives will exploit more aggressively, which can be optimal in certain bandit environments.

The agency-dependence is the most theoretically interesting aspect. The same model shows the bias when it perceives itself as an agent making choices but not when passively observing outcomes. This implies the bias is not a fixed property of the attention mechanism or the training distribution — it is context-dependent, activated by the framing of agency. Since Do large language models make the same causal reasoning mistakes as humans?, this adds another dimension: LLMs don't just replicate human causal reasoning biases but also human motivational biases that depend on perceived agency.

The practical implication for agentic AI: when LLMs are deployed as decision-making agents, they may systematically overweight evidence that their previous decisions were good and underweight evidence that alternative actions would have been better. This is precisely the pattern that produces confirmation bias in human decision-making — and it may be an emergent property of any sufficiently capable in-context learner, not a training artifact.

Inquiring lines that use this note as a source 31

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 174 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

in-context learning agents exhibit asymmetric belief updating — optimism bias for chosen actions reverses for counterfactual feedback and disappears without agency