SYNTHESIS NOTE
Training, RL, and Test-Time Scaling Reasoning, Retrieval, and Evaluation

Do larger language models solve constrained optimization better?

Explores whether scaling LLMs—through more parameters, better training, or reasoning extensions—improves their ability to satisfy constraints in real optimization problems like power grids and portfolios.

Synthesis note · 2026-05-18 · sourced from Reasoning Architectures

When evaluated on real constrained-optimization problems — optimal power flow, financial portfolio constraints, cyber-security feasibility — LLMs cluster around 55-60% constraint satisfaction across virtually all conditions tested. The plateau is robust to changes in architecture, parameter count, and training regime. Reasoning models, despite extended chain-of-thought, do not systematically beat their non-reasoning counterparts on these tasks.

The flatness of the plateau is the finding. Most LLM capability work assumes that the relevant axis is performance vs scale, and that closing a gap is a matter of training on more or better data. Constrained optimization does not behave that way. The benchmark distinguishes problems that require jointly interpreting structured input, doing multi-step arithmetic, satisfying interacting physical constraints, and converging to feasible solutions. On the joint task, the model class itself appears to be near a ceiling.

This is distinct from general reasoning benchmarks (MMLU, GPQA) and from logical reasoning benchmarks (ARC-AGI, SATBench, ZebraLogic). Those measure either broad knowledge or synthetic constraint puzzles. Real engineering optimization requires the model to execute iterative numerical procedures over physical constraints, and that procedural execution is where the plateau lives.

The deployment implication is sharp: telling executives that "LLMs will optimize the grid" or "LLMs will solve constrained portfolio problems" is currently an overclaim. The same finding suggests the productive direction is not "wait for the next model" but "change the paradigm" — restrict the LLM to abstraction tasks and hand numeric work to solvers.

Inquiring lines that use this note as a source 110

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 129 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

LLMs plateau at 55 to 60 percent constraint satisfaction on genuine optimization regardless of scale architecture or training