Do large language models make the same causal reasoning mistakes as humans?
Research on collider structures reveals whether LLMs share human biases in causal inference. This matters because if both fail identically, collaboration might reinforce rather than correct errors.
The collider structure C1 → E ← C2 (two independent causes with a shared effect) is a diagnostic test for normative causal reasoning. When you observe the effect E, observing one cause should lower your estimate of the other (explaining away). When E is absent, C1 and C2 should remain independent.
Humans systematically fail this test in characteristic ways:
- Weak explaining away: explaining away is present but weaker than normatively warranted
- Markov violations: treating supposedly independent causes as correlated even when no collider observation should create that correlation (a "rich-get-richer" associative bias)
The "Do LLMs Reason Causally Like Us?" paper (CLADDER dataset) finds that LLMs exhibit the same two biases in the same direction as humans. This is not the usual finding of LLM inferiority — it is a finding of human-like systematic error. LLMs are not categorically worse at causal reasoning; they err in the same direction.
This matters for several reasons. First, it undermines clean human-vs-LLM comparisons in causal reasoning tasks: if both fail in the same way, the relevant comparison shifts from "who is better" to "are the failure modes compatible." Second, it raises the question of mechanism: humans likely err due to the associative nature of pattern-matching; LLMs likely err for structurally related reasons (training on human text that exhibits the same biases). The shared error direction is evidence that Why do LLMs handle causal reasoning better than temporal reasoning? — the training data itself has these biases baked in.
Third, the finding has implications for high-stakes causal reasoning: medical diagnosis (collider structures appear in disease-symptom networks), legal reasoning (independent causes with shared outcomes), and policy analysis all involve collider-type structures. Human and LLM collaborators sharing the same biases may reinforce rather than correct each other's errors.
Inquiring lines that use this note as a source 51
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Do language models share the same cooperative truth-seeking rules as humans?
- Can prompt-based debiasing overcome entrenched LLM model priors?
- Does epistemic drift operate the same way across all languages?
- Why do LLM outputs match researcher priors without solving tasks correctly?
- What domain properties determine whether causal rules transfer to new agents?
- Do causal rules enforce robustness that statistical patterns alone cannot maintain?
- Why do causal graphs alone fail to capture human reasoning processes?
- Do language models exhibit the same causal biases that humans show?
- How do humans use associative reasoning without causal connections?
- Can causal models be extended to include non-causal cognition?
- Why do review corpora contain biases that affect generated comparisons?
- What circuit mechanisms produce belief bias in syllogistic reasoning?
- What makes causal explanations stronger anxiety predictors than counterfactuals or dissonance?
- What inductive bias would force models to learn Newtonian mechanics instead of shortcuts?
- How does this motivational bias connect to LLMs' causal reasoning failures?
- Do language models show the same truth bias as humans?
- How do world models create indirect causal grounding without physical environment contact?
- Why does the distinction between functional and causal grounding matter for AI alignment?
- Can language models develop world models that ground meaning in causal reality?
- Why do LLMs inherit causal biases from their training data?
- What are collider structures and why do they reveal reasoning errors?
- How might human-LLM teams reinforce each other's causal reasoning mistakes?
- Do LLMs rely on surface statistical patterns instead of causal structure?
- Where do collider-type reasoning errors appear in real-world decisions?
- Why does LLM compression eliminate causal grounding in conceptual representations?
- Can event boundaries be identified from statistical regularities without understanding events?
- What role does prediction error play in human event segmentation?
- What role does inductive bias play versus model capacity in practice?
- Can causal belief networks extracted from interviews predict how people respond to policy changes?
- Can functional semantic grounding substitute for true causal grounding?
- Why do causal reasoning directions succeed while temporal reasoning directions fail?
- Why do LLMs fail at counterfactual reasoning despite factual knowledge?
- Can LLMs reason through semantics without understanding causal mechanisms?
- How do causal belief networks extracted from interviews enable intervention reasoning?
- How does semantic association differ from mechanistic causal reasoning?
- What makes a causal abstraction more transferable than a generic heuristic?
- How does vehicle causality differ from content causality in physical systems?
- Do newer LLM generations create worse detector bias through increased linguistic divergence?
- Can external actions provide causal necessity that language models lack?
- What distinguishes a representational feature from a causally inert correlation?
- How does completion bias in agents differ from other epistemic failure modes?
- Can mechanistic interpretability tools decode the biases alignment training conceals?
- How does typicality bias in human annotation affect downstream model behavior?
- Why do LLMs reason fluently about causality but lack causal rigor?
- How can extracted causal belief networks enable intervention simulation?
- What prevents LLM representations from causally influencing generation outputs?
- Can a Reflect mechanism detect and revise failed causal predictions?
- How does causal structure avoid behaviorist limitations in LLM social simulation?
- Can modular expert decomposition extend beyond time into other causal dimensions?
- What architectural changes would help LLMs distinguish causal relationships from temporal sequences?
- Do LLMs show stronger reasoning about causality than about temporal ordering?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do LLMs handle causal reasoning better than temporal reasoning?
Exploring whether language models perform asymmetrically on different discourse relations and what training data patterns might explain the gap between causal and temporal reasoning abilities.
the training-data explanation for why LLMs inherit human causal biases; the collider finding is a specific manifestation
-
Do LLMs generalize moral reasoning by meaning or surface form?
When moral scenarios are reworded to reverse their meaning while keeping similar language, do LLMs recognize the semantic shift? This tests whether LLMs actually understand moral concepts or reproduce training distribution patterns.
parallel insight: LLM errors track surface statistical regularities in training data, not normative structure
-
Do foundation models learn world models or task-specific shortcuts?
When transformer models predict sequences accurately, are they building genuine world models that capture underlying physics and logic? Or are they exploiting narrow patterns that fail under distribution shift?
collider bias is one instance: surface associative patterns override normative causal structure
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Do Large Language Models Reason Causally Like Us? Even Better?
- Premise Order Matters in Reasoning with Large Language Models
- Large Causal Models From Large Language Models
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
- Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
- LLMs can implicitly learn from mistakes in-context
- Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
- Language models show human-like content effects on reasoning tasks
Original note title
llms exhibit human-like causal biases — weak explaining away and markov violations in collider networks