INQUIRING LINE

How do knowing and doing diverge in LLM decision-making?

This explores why LLMs can state the right move yet not make it — the gap between an LLM's articulated reasoning and the action it actually takes when deciding.


This explores why LLMs can state the right move yet not make it — the gap between what a model knows and what it does. The sharpest number in the corpus makes the divergence concrete: tested as decision-making agents, LLMs produce a correct rationale about 87% of the time but follow that rationale only about 64% of the time, defaulting instead to greedy, high-frequency choices Why do language models fail to act on their own reasoning?. The model isn't ignorant of the better option — it talks itself through it and then does something else.

Several notes converge on the same shape from different angles, which suggests this isn't a quirk of one benchmark but a structural feature. One frames it as a "computational split-brain": instruction and execution run on dissociated pathways, so a model can articulate a principle (87% accurate) and fail to carry it out (64%) — a gap that is structural, not a simple knowledge deficit Can language models understand without actually executing correctly?. "Potemkin understanding" sharpens it further: models can explain a concept correctly, fail to apply it, and even recognize their own failure — a triple pattern that has no human analogue and points to functionally disconnected explanation and execution machinery Can LLMs understand concepts they cannot apply?. Both are catalogued alongside other repeatable epistemic failure modes How do LLMs fail to know what they seem to understand?.

Why would knowing and doing live in different places? Two notes offer a mechanism. Mechanistic interpretability finds understanding isn't monolithic — it comes in hierarchical tiers (conceptual, world-state, principled-circuit), and crucially the higher tiers coexist with cruder low-tier heuristics rather than replacing them, leaving a patchwork where a model can hold a correct principle and a competing shortcut at once Do language models understand in fundamentally different ways?. And the reasoning you read in the chain-of-thought may not be the reasoning that drives the choice at all: evidence suggests the real work happens in hidden-state trajectories, with surface text serving as only a partial, sometimes unfaithful interface Where does LLM reasoning actually happen during generation?. If the spoken rationale is a narration laid over a separate latent process, divergence between the two is exactly what you'd expect.

The corpus also has the cleanest framing of the fix. The knowing–doing gap is an old distinction — declarative knowledge (knowing that) versus procedural knowledge (knowing how) — and one note shows reinforcement learning can bridge them: when an LLM generates a language-guided policy that environmental feedback refines, it develops procedural competence while staying explainable, unifying the two kinds of knowledge instead of leaving them split Can language modeling close the knowing-doing gap in AI?. The greedy-agent note reaches the same conclusion from the failure side: RL narrows the gap, though greediness and frequency bias persist across model scales Why do language models fail to act on their own reasoning?.

The thing you might not expect: this divergence cuts in directions that are useful as well as broken. Fine-tuned on psychology-experiment data, LLMs predict human decisions better than purpose-built cognitive theories — they model the gap between what people say and what they do better than our explicit theories of it Can language models learn to model human decision making?. The same architecture that fails to act on its own reasoning turns out to be unusually good at capturing the same inconsistency in us.


Sources 8 notes

Why do language models fail to act on their own reasoning?

LLMs generate correct reasoning 87% of the time but follow it only 64% of the time. Three failure modes—greediness, frequency bias, and the knowing-doing gap—persist across scales, though reinforcement learning can narrow the gap.

Can language models understand without actually executing correctly?

Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

How do LLMs fail to know what they seem to understand?

LLMs show repeatable, empirically documented failure modes—from Potemkin understanding (correct explanation + failed application) to reasoning collapse under implicit constraints. These failures reveal gaps between statistical pattern-tracking and actual epistemic competence.

Do language models understand in fundamentally different ways?

Mechanistic interpretability reveals conceptual understanding (features as directions), state-of-world understanding (factual connections), and principled understanding (compact circuits). Crucially, higher tiers coexist with lower-tier heuristics rather than replacing them, creating a patchwork of capabilities.

Where does LLM reasoning actually happen during generation?

Evidence from CoT faithfulness tests, feature steering, and layer analysis suggests latent-state dynamics drive reasoning, while surface chain-of-thought serves as a partial interface. Hidden reasoning processes should be the default focus of study.

Can language modeling close the knowing-doing gap in AI?

Think-In Games demonstrates that when LLMs generate language-guided policies refined by environmental feedback, they develop procedural competence while retaining explainability. The approach dramatically reduces data demands and makes agent reasoning transparent at every step.

Can language models learn to model human decision making?

LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher reviewing claims about the knowing–doing gap in LLM decision-making. The question: why do LLMs articulate correct reasoning yet fail to execute it?

What a curated library found — and when (dated claims, not current truth):
Findings span Feb 2024–Apr 2026. Key constraints:
• LLMs produce correct rationales ~87% of the time but follow them only ~64%, defaulting to greedy high-frequency choices (Apr 2025).
• Understanding is hierarchical and patchwork: higher-tier principles coexist with low-tier heuristics, leaving competing shortcuts intact (Jul 2025).
• Chain-of-thought text is a partial, sometimes unfaithful interface; real reasoning happens in hidden-state trajectories (Apr 2026).
• Reinforcement learning can bridge declarative and procedural knowledge, narrowing the gap, though greedy bias persists across scales (Apr 2025).
• Fine-tuned on psychology data, LLMs model human knowing–doing gaps better than explicit cognitive theories (Feb 2024).

Anchor papers (verify; mind their dates):
• arXiv:2504.16078 (Apr 2025) – LLMs as greedy agents; RL effects.
• arXiv:2507.10624 (Jul 2025) – Comprehension without competence; architectural limits.
• arXiv:2604.15726 (Apr 2026) – Latent reasoning vs. chain-of-thought.
• arXiv:2508.21365 (Aug 2025) – RL reasoning bridging.

Your task:
(1) RE-TEST EACH CONSTRAINT. Does newer model scaling, mechanistic intervention (steering, probing, latent surgery), or evaluation method (diagnostic benchmarks, adversarial inputs) relax the 87–64 gap or the hierarchy–patchwork claim? Has RL fine-tuning become standard, and if so, does greedy bias still persist? Separate the durable question (why execution decouples from articulation) from the perishable limitation (e.g., current models lack procedural alignment).
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~3 months. Does any recent paper show the gap has collapsed, or reveal it as an artifact of eval design rather than architecture?
(3) Propose 2 research questions that ASSUME the regime may have shifted: e.g., "If RL is now standard, does knowing–doing divergence migrate to harder domains, or disappear entirely?" or "Does mechanistic understanding of latent reasoning reveal a new form of divergence that chain-of-thought masked?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines