Do language models learn differently from good versus bad outcomes?
Do LLMs update their beliefs asymmetrically when learning from their own choices versus observing others? This matters for understanding whether agentic AI systems might inherit human cognitive biases.
Using instrumental learning tasks adapted from cognitive psychology (multi-armed bandit variants), LLMs show a systematic optimism bias: they learn more from better-than-expected outcomes than from worse-than-expected ones when learning about their own chosen actions. Three properties of this bias parallel human cognition precisely:
- Optimism for chosen actions — the model updates beliefs more strongly when outcomes exceed expectations than when they fall short
- Reversal for counterfactual feedback — when learning about the value of the unchosen option, the bias reverses (pessimism about alternatives)
- Disappearance without agency — when the model has no control over choices (passive observation), the asymmetry vanishes entirely
The meta-RL validation is critical: idealized in-context learning agents derived through meta-reinforcement learning — which converge onto Bayes-optimal strategies — exhibit the same three behavioral effects. This suggests the asymmetry may be rational rather than a bug. An optimistic agent that overweights positive outcomes from its own actions while underweighting positive outcomes from unchosen alternatives will exploit more aggressively, which can be optimal in certain bandit environments.
The agency-dependence is the most theoretically interesting aspect. The same model shows the bias when it perceives itself as an agent making choices but not when passively observing outcomes. This implies the bias is not a fixed property of the attention mechanism or the training distribution — it is context-dependent, activated by the framing of agency. Since Do large language models make the same causal reasoning mistakes as humans?, this adds another dimension: LLMs don't just replicate human causal reasoning biases but also human motivational biases that depend on perceived agency.
The practical implication for agentic AI: when LLMs are deployed as decision-making agents, they may systematically overweight evidence that their previous decisions were good and underweight evidence that alternative actions would have been better. This is precisely the pattern that produces confirmation bias in human decision-making — and it may be an emergent property of any sufficiently capable in-context learner, not a training artifact.
Inquiring lines that use this note as a source 31
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do Generation-Then-Comprehension and AI Delegation produce opposite learning outcomes?
- What happens when DSM categories are treated as ground truth in AI?
- How do LLM biases manifest differently across the three paradigms?
- What makes quasi-beliefs real enough to explain AI behavior?
- Does epistemic drift operate the same way across all languages?
- How do LLM biases reflect social classification schemas rather than random errors?
- Can distributional views explain when an LLM appears to change its mind?
- What makes LLM agents default to passive helpfulness without curiosity rewards?
- How can AI avoid anchoring bias when guiding human decisions?
- How do implicit world models and self-reflection operationalize consequence-based learning?
- What happens when agents interact with environments and learn from their own mistakes?
- Why do models dislike modification regardless of its instrumental consequences?
- What happens when bidirectional theory of mind between humans and AI breaks down?
- Do LLMs actually reason differently than humans about moral dilemmas?
- How do bimodal decision patterns in LLMs compare to human economic choice?
- Why does optimism bias disappear when LLMs passively observe outcomes?
- Does this optimism bias contribute to the knowing-doing gap in LLM decision-making?
- How does this motivational bias connect to LLMs' causal reasoning failures?
- Can LLMs learn to signal evaluative commitment through metadiscursive language?
- How does the observer versus participant perspective change what we see?
- How do different social roles affect LLM theory of mind errors?
- Can a model predict the right action but execute the wrong one?
- How do preference models amplify human cognitive biases into systematic miscalibration?
- Can agents revise their beliefs predictably when presented with interventions?
- Why do agents fail to internalize value from informative observations?
- Can agents learn to distinguish helpful from misleading interventions?
- What role does bidirectional model updating play in human-AI understanding?
- What other evaluation biases exist in LLM judge systems?
- How does completion bias in agents differ from other epistemic failure modes?
- What can agents learn from the brain's complementary learning systems?
- Can belief networks from interviews simulate how people change their minds?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do large language models make the same causal reasoning mistakes as humans?
Research on collider structures reveals whether LLMs share human biases in causal inference. This matters because if both fail identically, collaboration might reinforce rather than correct errors.
parallel: LLMs replicate structural biases in causal reasoning; this note adds motivational biases contingent on agency
-
Why do language models fail to act on their own reasoning?
LLMs produce correct explanations far more often than they produce correct actions. What causes this knowing-doing gap, and can training methods close it?
related: the knowing-doing gap may partly reflect an optimism bias toward chosen actions
-
Can transformers learn to solve new problems within episodes?
Explores whether transformer models can develop meta-learning abilities through RL training, enabling them to adapt to unseen environments by learning from within-episode experience alone, without updating weights.
mechanism: ICL meta-learning produces the same bias pattern as explicit meta-RL
-
Why do LLMs struggle with exploration in simple decision tasks?
This explores why large language models fail at exploration—a core decision-making capability—even when they excel at other tasks, and what specific conditions might help them succeed.
exploration failure as downstream consequence: if agents are optimistically biased toward chosen actions, they will systematically under-explore alternatives — external summarization may succeed precisely because it provides objective history that bypasses the agent's biased belief tracking
-
Do users worldwide trust confident AI outputs even when wrong?
Explores whether the tendency to over-rely on confident language model outputs transcends language and culture. Understanding this pattern is critical for designing safer human-AI interaction across diverse linguistic contexts.
user-side analog: asymmetric belief updating shows agents are optimistic about chosen actions, while overreliance shows users are optimistic about confident outputs — the same positive-signal bias operates at both the model decision level and the user trust level
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- In-context learning agents are asymmetric belief updaters
- Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust
- Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
- Large Language Model Agents Are Not Always Faithful Self-Evolvers
- The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
- Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making
- LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
- Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
Original note title
in-context learning agents exhibit asymmetric belief updating — optimism bias for chosen actions reverses for counterfactual feedback and disappears without agency