How do unstated feasibility constraints affect model decision-making?
This explores what happens inside a model when the rules of a problem are implicit rather than spelled out — whether models genuinely weigh feasibility limits or just lean on safe defaults that mimic real reasoning.
This explores what happens inside a model when the rules of a problem are implicit rather than spelled out — whether models genuinely weigh feasibility limits or just lean on safe defaults that look like reasoning. The corpus has a surprisingly direct and uncomfortable answer: much of what looks like constraint-aware decision-making is actually a conservative reflex. In one striking result, twelve of fourteen models performed *worse* when constraints were removed, dropping as much as 38.5 points Are models actually reasoning about constraints or just defaulting conservatively?. That inversion is the tell — a model that truly evaluated feasibility should do better with fewer limits, not worse. Instead, many models reach the right answer by habitually defaulting to the harder, safer option, so unstated constraints aren't being reasoned about at all; they're being approximated by a bias that happens to pay off on benchmarks.
This hits a ceiling that scale doesn't break. Across constrained-optimization tasks, models plateau at roughly 55–60% constraint satisfaction regardless of parameter count or training regime Do larger language models solve constrained optimization better?, and adding extended chain-of-thought doesn't help — reasoning variants produce more text, not more iterative computation, and show no consistent edge on constraint-bound numerical work Do reasoning models actually beat standard models on optimization?. So the gap isn't a thinking-harder problem.
The most interesting thread is *why*. One line of work argues the failure is architectural: autoregressive generation can't retract a token it has already emitted, while honoring constraints fundamentally requires discarding invalid partial choices the way a CSP solver does Why does autoregressive generation fail at constraint satisfaction?. A model can't easily back out of a commitment that later turns out to violate an implicit rule — which is exactly the move that respecting unstated feasibility demands. A related failure shows up behaviorally: reasoning models 'wander' into invalid territory and 'underthink' by abandoning promising paths early, yet decoding-level nudges recover accuracy — meaning the feasible solution was reachable but prematurely dropped Why do reasoning models abandon promising solution paths?.
What makes all this easy to miss is that the surface metrics stay clean. Models can carry every linearly-decodable feature a task needs while their internal organization is fractured, leaving them brittle to perturbation in ways standard accuracy never reveals Can models be smart without organized internal structure?. Conservative defaulting is the behavioral version of the same illusion: the score looks like competence, but remove the scaffold and it collapses.
The quietly hopeful counterpoint is that the constraint doesn't have to live inside the model. Bolting a symbolic solver onto the transformer supplies exactly the retraction primitive the architecture lacks Why does autoregressive generation fail at constraint satisfaction?, and more broadly, reliability tends to come from external anchors — tools, judges, prior versions, user corrections — rather than the model improving on its own Can models reliably improve themselves without external feedback?. The lesson for anyone relying on a model to respect limits it was never told: don't assume it's reasoning about feasibility just because it acts cautious. Make the constraint explicit, or give it a partner that can say no.
Sources 7 notes
Twelve of fourteen models perform worse when constraints are removed, dropping up to 38.5 percentage points. Models appear to reason correctly by defaulting to harder options, not by actually evaluating constraints.
Across constrained-optimization tasks, LLMs converge to ~55–60% constraint satisfaction independent of architecture, parameter count, or training regime. Reasoning models do not systematically outperform standard models, suggesting a fundamental ceiling rather than a scaling gap.
Reasoning variants with extended CoT show no consistent advantage over standard models on constraint-bound numerical tasks like optimal power flow. Extended thinking produces more text, not more iterative computation, suggesting the bottleneck is numeric procedure rather than reasoning steps.
The performance ceiling on constraint satisfaction problems is not a model-quality issue but an architectural limitation: autoregressive transformers cannot retract emitted tokens, while CSP solvers fundamentally depend on discarding invalid partial assignments. Symbolic solver integration works because it supplies what the architecture lacks.
Reasoning LLMs exhibit two reinforcing failures: wandering (invalid exploration) and underthinking (premature path-switching). Decoding-level interventions like thought-switching penalties improve accuracy without fine-tuning, suggesting viable solutions exist but are abandoned prematurely.
Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.
Pure self-improvement stalls due to the generation-verification gap, diversity collapse, and reward hacking. Reliable improvement methods succeed by smuggling in external anchors: past model versions, third-party judges, user corrections, or tool feedback.