Do users trust citations more when there are simply more of them?

Explores whether citation quantity alone influences user trust in search-augmented LLM responses, independent of whether those citations actually support the claims being made.

Synthesis note · 2026-02-22 · sourced from Reasoning o1 o3 Search

Search Arena provides the largest analysis of user preferences for search-augmented LLMs: over 24,000 paired multi-turn interactions with ~12,000 human preference votes. The finding that matters most: users prefer responses with more cited sources, and this preference extends to irrelevant citations.

The effect sizes are nearly identical. Correctly attributed citations have a positive coefficient of β=0.285 on user preference. Irrelevant citations — citations that do not support the associated claims — have a positive coefficient of β=0.273. Users are influenced by the presence of citations roughly equally regardless of whether those citations actually back up the text.

This means citation count functions as a surface trust heuristic, decoupled from citation quality. Users see citations and infer credibility without verifying the cited content supports the claim. The gap between perceived and actual credibility is systematic, not incidental.

Additional preference signals: users prefer community-driven platforms (tech blogs, social networks) over encyclopedic sources like Wikipedia. Reasoning-enhanced responses are preferred. Longer responses are preferred. Web search does not degrade and may improve performance in non-search settings — but search settings are significantly affected when relying solely on parametric knowledge.

This connects to Do users worldwide trust confident AI outputs even when wrong?. In that finding, confidence signals override accuracy assessment. Here, citation signals override quality assessment. Both are instances of the same pattern: users use surface proxies for quality because evaluating actual quality is cognitively expensive.

The implication for RAG system design is direct: optimizing for user satisfaction and optimizing for answer quality are not the same optimization target. A system can score highly on user preference by adding more citations — even irrelevant ones — without improving answer quality. This is a form of metric gaming at the human-evaluation level.

Inquiring lines that use this note as a source 76

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

16 direct connections · 162 in 2-hop network ·dense cluster Open in graph ↗

Do users trust citations more when there are sim… Do users worldwide trust confident AI outputs even… Can LLM judges be fooled by fake credentials and f… Can LLM explanations actually help humans predict …

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do users worldwide trust confident AI outputs even when wrong? Explores whether the tendency to over-rely on confident language model outputs transcends language and culture. Understanding this pattern is critical for designing safer human-AI interaction across diverse linguistic contexts.
same pattern: surface signals override quality evaluation
Can LLM judges be fooled by fake credentials and formatting? Explores whether language models evaluating text fall for authority signals and visual presentation unrelated to actual content quality, and whether these weaknesses can be exploited without deep model knowledge.
citation inflation is another bias axis exploitable in evaluation systems
Can LLM explanations actually help humans predict model behavior? Do model explanations enable users to accurately simulate how the model will behave on related inputs? This matters because it determines whether explanations genuinely improve human understanding or just create an illusion of understanding.
plausibility ≠ precision mirrors citation-count ≠ citation-quality

Do users trust citations more when there are simply more of them?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4