SYNTHESIS NOTE

Are models actually reasoning about constraints or just defaulting conservatively?

Do language models genuinely apply constraints when solving problems, or do they simply prefer harder options by default? Minimal pair testing reveals whether apparent reasoning success masks hidden biases.

Synthesis note · 2026-05-01 · sourced from Linguistics, NLP, NLU

The Heuristic Override Benchmark uses minimal pairs — same surface heuristic, with versus without the implicit constraint — to test whether apparent reasoning successes reflect actual reasoning. The result is striking. Twelve of fourteen models perform worse on the no-constraint variant than on the constraint-active variant, with drops up to 38.5 percentage points. Only two models (GPT-OSS-120B at +13.8 and GPT-OSS-20B at +11.0) improve when the constraint is removed.

This exposes a hidden mechanism behind apparent accuracy. When the constraint is present, the correct answer is the harder one (drive to the car wash that is 50m away). When the constraint is removed, the correct answer is the easier one (walk to the store that is 50m away). Models that default to recommending the harder option score correctly on constraint-active cases without doing any constraint reasoning. They are not solving the problem. They are reflexively choosing the more conservative option, which happens to coincide with the constraint-required answer.

The minimal-pair asymmetry is the only test that catches this. Single-instance accuracy looks fine — the model recommended driving, the right answer was driving. But the same model recommends driving even when walking would be correct, because the recommendation is not based on the constraint. The two-of-fourteen models that improve on minimal pairs are the only ones whose constraint-active accuracy reflects genuine reasoning about the constraint. The rest are riding a conservative-bias accident that aggregate metrics cannot distinguish from reasoning.

Inquiring lines that use this note as a source 108

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

Conservative bias hides behind apparent reasoning success — most models perform worse when the constraint is removed than when it is present

Are models actually reasoning about constraints or just defaulting conservatively?

Related papers in this collection 8

Search by related questions 4