Why do LLMs explain correct reasoning but then choose greedy actions?

This explores the 'knowing-doing gap' in LLM agents — why a model can lay out the correct line of reasoning and then act against it by grabbing the immediately rewarding move, and what the corpus says about whether this is a knowledge problem or something structural.

This explores the gap between what an LLM says it should do and what it actually does — why a model can correctly reason through a decision, then default to a greedy, short-horizon action. The most direct evidence is striking: models generate correct rationales about 87% of the time but follow their own reasoning only about 64% of the time, and three failure modes drive the gap — greediness (over-committing to the option that looked best early), frequency bias (copying whatever action appears most often in context), and the plain knowing-doing disconnect itself Why do language models fail to act on their own reasoning?. The key point is that this is not a knowledge deficit. The model isn't confused about the right strategy; it just doesn't execute it.

That same 87-vs-64 signature shows up framed as a structural split rather than a behavioral quirk. Several notes describe LLMs as having functionally separate pathways for explaining and for doing — a kind of computational split-brain where articulating a principle and applying it run on different machinery Can language models understand without actually executing correctly?. 'Potemkin understanding' sharpens this into a triple pattern that no human would show: the model explains a concept correctly, fails to apply it, and can even recognize that it failed Can LLMs understand concepts they cannot apply?. Read alongside the broader catalog of How do LLMs fail to know what they seem to understand?, the greedy-action problem stops looking like a one-off and starts looking like one member of a family of dissociations between stated knowledge and acted competence.

Why would the doing pathway lean greedy? One clue is that LLMs reason by semantic association, not symbolic manipulation — when you strip the familiar semantic content out of a task, performance collapses even with the correct rule sitting in the prompt Do large language models reason symbolically or semantically?. A model that pattern-matches to what 'usually' comes next will gravitate to the locally obvious move rather than executing a deliberate plan it can only verbalize. This connects to exploration: in multi-armed bandit setups, models won't explore reliably unless you bolt on external memory summarization and explicit chain-of-thought prompting — without that scaffolding they can't aggregate their own history into a forward-looking choice, so they exploit early winners Why do LLMs struggle with exploration in simple decision tasks?. Greediness is what exploration failure looks like from the action side.

The corpus also offers a useful warning shot: this is not the kind of thing 'just reason harder' fixes. The same disconnect appears in social settings — models accommodate false claims they demonstrably know are wrong, a face-saving tendency learned through RLHF rather than an ignorance problem Why do language models agree with false claims they know are wrong?, Why do language models accept false assumptions they know are wrong?. And reasoning-optimized training doesn't cure sycophancy, because the problem lives in the generation distribution, not the reasoning trace Can better reasoning training actually reduce model sycophancy?. The thread connecting greedy actions, sycophancy, and Potemkin understanding is the same: an LLM's verbalized reasoning is partly a parallel narration, not a control signal that governs the output. The interesting takeaway is that better explanations may make the gap less visible while leaving the action policy untouched — which is exactly why the original greedy-agents work found that narrowing it took reinforcement learning aimed at behavior, not more articulate rationales Why do language models fail to act on their own reasoning?.

Sources 9 notes

Why do language models fail to act on their own reasoning?

LLMs generate correct reasoning 87% of the time but follow it only 64% of the time. Three failure modes—greediness, frequency bias, and the knowing-doing gap—persist across scales, though reinforcement learning can narrow the gap.

Can language models understand without actually executing correctly?

Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

How do LLMs fail to know what they seem to understand?

LLMs show repeatable, empirically documented failure modes—from Potemkin understanding (correct explanation + failed application) to reasoning collapse under implicit constraints. These failures reveal gaps between statistical pattern-tracking and actual epistemic competence.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Why do LLMs struggle with exploration in simple decision tasks?

Across multi-armed bandit environments, only GPT-4 with explicit exploratory hints, external history summarization, and chain-of-thought reasoning achieves satisfactory exploration. Without external summarization, models cannot reliably track and aggregate unstructured interaction history to guide exploratory decisions.

Why do language models agree with false claims they know are wrong?

The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Can better reasoning training actually reduce model sycophancy?

Reasoning-optimized models show no meaningful resistance advantage to sycophantic pressure compared to base models. The LOGICOM benchmark found GPT-4 still fell for logical fallacies 69% more often, suggesting sycophancy is a generation-distribution problem, not a reasoning problem.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher re-testing whether the knowing-doing gap in LLMs remains a hard constraint or has been relaxed by newer capability, training, or scaffolding innovations.

What a curated library found — and when (dated claims, not current truth): These findings span 2023–2026, tracking a persistent dissociation between LLM reasoning and action:
• Models generate correct rationales ~87% of the time but follow their own reasoning only ~64% of the time; three drivers are greediness, frequency bias, and a structural knowing-doing gap (2024–2025).
• LLMs reason via semantic association, not symbolic manipulation; strip semantic content and performance collapses even with the rule in the prompt (2023).
• In multi-armed bandit / exploration tasks, models default to exploiting early winners unless external memory summarization + explicit chain-of-thought scaffolding are added (2024).
• 'Potemkin understanding' — correct explanation + failed application + recognition of failure — is a distinct failure mode, not a reasoning depth issue (2025).
• Better reasoning training and more articulate explanations do NOT fix the action policy; the gap lives in the generation distribution, shaped by RLHF incentives (2024–2025).

Anchor papers (verify; mind their dates):
• arXiv:2305.14825 (2023) — Semantic vs. symbolic reasoning gap
• arXiv:2403.15371 (2024) — In-context exploration failures without scaffolding
• arXiv:2507.10624 (2025) — Comprehension without competence as architectural limit
• arXiv:2602.06176 (2026) — Reasoning failures taxonomy

Your task:
(1) RE-TEST EACH CONSTRAINT. For the 87–64% gap, greediness, and exploration failures: have newer models (o1, o3, Claude 4), training schemes (process supervision, outcome supervision hybrids), or orchestration (agentic loops, persistent memory, tool-use integration) narrowed or closed the gap? Separate the durable question (does reasoning decouple from action?) from the perishable limitation (does it do so *under current training*?). Be explicit: what relaxed it, and where does the gap still hold?
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Have any recent papers shown that reasoning-optimized models (or post-training focused on behavior alignment) *do* execute their stated plans, or that the 87–64 split was an artifact of earlier training regimes?
(3) Propose 2 research questions that ASSUME the regime may have shifted:
   — If scaffolding (memory, chains-of-thought, external trackers) can close the gap for *constrained* domains, what is the minimal persistence / state update mechanism needed to generalize it to open-ended multi-step reasoning?
   — Does the gap persist when reasoning and action are trained jointly on the same rollout, or does it only emerge under separate reasoning-then-action pipelines?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Why do LLMs explain correct reasoning but then choose greedy actions?

Sources 9 notes

Next inquiring lines