How do knowing and doing diverge in LLM decision-making?
This explores why LLMs can state the right move yet not make it — the gap between an LLM's articulated reasoning and the action it actually takes when deciding.
This explores why LLMs can state the right move yet not make it — the gap between what a model knows and what it does. The sharpest number in the corpus makes the divergence concrete: tested as decision-making agents, LLMs produce a correct rationale about 87% of the time but follow that rationale only about 64% of the time, defaulting instead to greedy, high-frequency choices Why do language models fail to act on their own reasoning?. The model isn't ignorant of the better option — it talks itself through it and then does something else.
Several notes converge on the same shape from different angles, which suggests this isn't a quirk of one benchmark but a structural feature. One frames it as a "computational split-brain": instruction and execution run on dissociated pathways, so a model can articulate a principle (87% accurate) and fail to carry it out (64%) — a gap that is structural, not a simple knowledge deficit Can language models understand without actually executing correctly?. "Potemkin understanding" sharpens it further: models can explain a concept correctly, fail to apply it, and even recognize their own failure — a triple pattern that has no human analogue and points to functionally disconnected explanation and execution machinery Can LLMs understand concepts they cannot apply?. Both are catalogued alongside other repeatable epistemic failure modes How do LLMs fail to know what they seem to understand?.
Why would knowing and doing live in different places? Two notes offer a mechanism. Mechanistic interpretability finds understanding isn't monolithic — it comes in hierarchical tiers (conceptual, world-state, principled-circuit), and crucially the higher tiers coexist with cruder low-tier heuristics rather than replacing them, leaving a patchwork where a model can hold a correct principle and a competing shortcut at once Do language models understand in fundamentally different ways?. And the reasoning you read in the chain-of-thought may not be the reasoning that drives the choice at all: evidence suggests the real work happens in hidden-state trajectories, with surface text serving as only a partial, sometimes unfaithful interface Where does LLM reasoning actually happen during generation?. If the spoken rationale is a narration laid over a separate latent process, divergence between the two is exactly what you'd expect.
The corpus also has the cleanest framing of the fix. The knowing–doing gap is an old distinction — declarative knowledge (knowing that) versus procedural knowledge (knowing how) — and one note shows reinforcement learning can bridge them: when an LLM generates a language-guided policy that environmental feedback refines, it develops procedural competence while staying explainable, unifying the two kinds of knowledge instead of leaving them split Can language modeling close the knowing-doing gap in AI?. The greedy-agent note reaches the same conclusion from the failure side: RL narrows the gap, though greediness and frequency bias persist across model scales Why do language models fail to act on their own reasoning?.
The thing you might not expect: this divergence cuts in directions that are useful as well as broken. Fine-tuned on psychology-experiment data, LLMs predict human decisions better than purpose-built cognitive theories — they model the gap between what people say and what they do better than our explicit theories of it Can language models learn to model human decision making?. The same architecture that fails to act on its own reasoning turns out to be unusually good at capturing the same inconsistency in us.
Sources 8 notes
LLMs generate correct reasoning 87% of the time but follow it only 64% of the time. Three failure modes—greediness, frequency bias, and the knowing-doing gap—persist across scales, though reinforcement learning can narrow the gap.
Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.
Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.
LLMs show repeatable, empirically documented failure modes—from Potemkin understanding (correct explanation + failed application) to reasoning collapse under implicit constraints. These failures reveal gaps between statistical pattern-tracking and actual epistemic competence.
Mechanistic interpretability reveals conceptual understanding (features as directions), state-of-world understanding (factual connections), and principled understanding (compact circuits). Crucially, higher tiers coexist with lower-tier heuristics rather than replacing them, creating a patchwork of capabilities.
Evidence from CoT faithfulness tests, feature steering, and layer analysis suggests latent-state dynamics drive reasoning, while surface chain-of-thought serves as a partial interface. Hidden reasoning processes should be the default focus of study.
Think-In Games demonstrates that when LLMs generate language-guided policies refined by environmental feedback, they develop procedural competence while retaining explainability. The approach dramatically reduces data demands and makes agent reasoning transparent at every step.
LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.