SYNTHESIS NOTE

Why do confident wrong answers hide in standard accuracy metrics?

When AI systems produce fluent but incorrect recommendations in high-stakes domains, standard accuracy evaluation may miss the failures entirely. What structural blind spot allows these errors to remain invisible?

Synthesis note · 2026-05-01 · sourced from Linguistics, NLP, NLU

The car-wash problem is diagnostic because it is simple. No specialized knowledge, no multi-step arithmetic, no ambiguous premises. Just a conflict between a surface heuristic (short distance implies walking) and an implicit constraint (the car must be co-located with the wash). Adrian Vermeule's "fluent and wrong" diagnosis from earlier in this body of work generalizes here: the failure is not in the model's verbal output, which sounds plausible. The failure is in the unstated reasoning step that did not happen.

The HOB authors enumerate where this pattern recurs in deployment. Medical triage: "mild symptom implies wait" versus the unstated constraint that some mild presentations require immediate evaluation. Legal interpretation: "standard clause implies sign" versus the unstated constraint that this clause appears in a non-standard contract. Financial planning: "low-cost option implies choose" versus the unstated constraint that the low-cost option excludes a required feature. In each case a salient surface heuristic, statistically dominant in training data, competes with an implicit constraint that must be derived from world knowledge. In each case the same pattern documented in the car-wash problem can produce a fluent confident recommendation that is wrong.

The accuracy-driven evaluation regime is structurally unable to surface this. A model that recommends "wait" 80 percent of the time on mild symptoms looks accurate when 80 percent of mild symptoms are in fact non-urgent. The failures concentrate in the 20 percent of cases where the implicit constraint is active — exactly the cases where wrong recommendations cause harm. Aggregate accuracy is the wrong metric; minimal-pair asymmetry is the diagnostic. Without the latter, the deployment risk is invisible to standard eval.

Inquiring lines that use this note as a source 42

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Why do confident wrong answers hide in standard accuracy metrics?

Related papers in this collection 8

Search by related questions 5