Does AI fact-checking actually help people spot misinformation?
An RCT tested whether AI fact-checks improve people's ability to judge headline accuracy. The results reveal asymmetric harms: AI errors push users in the wrong direction more than correct labels help them.
A preregistered RCT tested AI fact-checks (from a popular AI model) on political news headlines. The overall finding: AI fact-checking does not significantly affect participants' ability to discern headline accuracy or share accurate news. But the errors are asymmetric and harmful.
The asymmetry: when the AI mislabels true headlines as false, participants decrease their belief in those true headlines. When the AI expresses uncertainty about false headlines, participants increase their belief in those false headlines. The AI's mistakes are not neutral — they actively push users in the wrong direction on both ends.
The opt-in finding is equally concerning. When participants are given the choice to view AI fact checks and choose to do so, they become significantly more likely to share both true and false news — but only more likely to believe false news. Self-selection into AI assistance does not indicate sophistication; it correlates with increased vulnerability to misinformation.
This connects to the overreliance literature through a specific mechanism: users are not simply trusting AI outputs — they are using AI outputs as replacement signals for their own judgment. When the AI says "false," the user's prior belief in a true headline is overridden. The user delegates the epistemic work rather than using the AI as one input among many.
The practical implication is severe for AI deployment in information integrity contexts. An AI fact-checker that is "reasonably" accurate but imperfect creates a false safety net. Users who rely on it perform worse than users who rely on their own judgment, because the mislabeling errors have outsized influence. The asymmetry means AI fact-checking is net harmful unless accuracy exceeds a threshold where mislabeling damage is offset by correct labeling benefit — and the paper suggests current AI is below that threshold.
Inquiring lines that use this note as a source 9
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do users override their own judgment when AI says a headline is false?
- What threshold of accuracy would make AI fact-checking net beneficial instead of harmful?
- Do people who choose to use AI fact-checkers actually become better at spotting misinformation?
- How does AI fact-checking compare to other trust signals like citation counts?
- Why don't users push back when AI makes obvious mistakes about false claims?
- Why do human raters miss factual errors that domain experts catch?
- How does AI fact-checking increase belief in false headlines users saw?
- Can fact-checking labels replace the cultural work of developing a discount?
- How do verification labels themselves become part of the misinformation problem?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do users worldwide trust confident AI outputs even when wrong?
Explores whether the tendency to over-rely on confident language model outputs transcends language and culture. Understanding this pattern is critical for designing safer human-AI interaction across diverse linguistic contexts.
the overreliance mechanism; AI fact-checking is a specific instance where overreliance produces measurable harm
-
Why do language models accept false assumptions they know are wrong?
Explores why LLMs fail to reject false presuppositions embedded in questions even when they possess correct knowledge about the topic. This matters because it reveals a grounding failure distinct from knowledge deficits.
the same accommodative tendency: AI doesn't push back hard enough on false claims, and users don't push back on AI errors
-
Do users trust citations more when there are simply more of them?
Explores whether citation quantity alone influences user trust in search-augmented LLM responses, independent of whether those citations actually support the claims being made.
trust heuristics override content evaluation in both citation and fact-checking contexts
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Artificial intelligence is ineffective and potentially harmful for fact checking
- Can AI Explanations Make You Change Your Mind?
- Training language models to be warm and empathetic makes them less reliable and more sycophantic
- Self-critiquing models for assisting human evaluators
- Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
- Fake News Detectors are Biased against Texts Generated by Large Language Models
- Language Models Learn to Mislead Humans via RLHF
- Humans or LLMs as the Judge? A Study on Judgement Biases
Original note title
AI fact-checking creates asymmetric harm through mislabeling — users decrease belief in true headlines labeled false and increase belief in false headlines labeled uncertain