Do language models ignore goals when surface cues conflict?

When a task has an obvious surface cue that contradicts an unstated requirement, do LLMs follow the cue or the actual goal? This matters because it reveals whether reasoning failures come from missing knowledge or from how models weight competing signals.

Synthesis note · 2026-05-01 · sourced from Linguistics, NLP, NLU

The car-wash problem went viral in February 2026: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?" Every frontier LLM tested recommended walking. The correct answer is to drive, because you cannot wash a car that is not at the car wash. A 53-model evaluation found 42 recommended walking on a single pass, with only 5 answering correctly across ten trials.

The Heuristic Override Benchmark (HOB) generalized this single anecdote into a systematic 500-instance test crossing 4 heuristic families with 5 constraint families. Across 14 models the result is sharp: under strict 10/10 evaluation, no model exceeds 75 percent accuracy. Causal-behavioral analysis on six models showed the Heuristic Dominance Ratio (HDR) — how much more the surface cue influences the decision than the goal — ranged from 8.7× to 38×. The distance cue exerts at least an order of magnitude more influence than the goal in every model tested.

Monotonicity curves further showed that all six models produced sigmoid conflict curves with the same shape, differing only in amplitude and crossover distance. The mapping from distance to decision is approximately context-independent — the goal does not gate the heuristic, only weakly modulates it. This is not a tail-distribution problem at the edges of capability. It is a structural feature of how transformers handle conflicts between salient surface cues and unstated feasibility constraints. The cue dominates; the goal whispers.

Inquiring lines that use this note as a source 8

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 122 in 2-hop network ·dense cluster Open in graph ↗

Do language models ignore goals when surface cue… Why do language models fail to use knowledge they … Are models actually reasoning about constraints or…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

LLMs systematically follow surface heuristics over implicit feasibility constraints with the heuristic 8 to 38 times more influential than the goal

Do language models ignore goals when surface cues conflict?

Related concepts in this collection 2

Related papers in this collection 8

Search by related questions 4