Can external actions provide causal necessity that language models lack?

This explores whether acting on the world — taking interventions, running experiments — could give models a grounding in cause-and-effect that text-only training can't supply, since LLMs learn causality from how often it's described rather than from doing.

This question is really asking whether causality has to be *earned through action* — and whether language models, which only ever read about the world, are missing something that acting in it would provide. The corpus suggests the answer is mostly yes, and it pinpoints why. LLMs are surprisingly good at causal reasoning, but for a revealing reason: causal connectives like 'because' and 'therefore' are explicit and frequent in text, so models pick them up easily — while temporal order, which is usually left implicit, trips them up Why do LLMs handle causal reasoning better than temporal reasoning?. In other words, their causality is borrowed from the *surface of language*, not derived from watching what follows what. The tell is that they inherit human causal mistakes wholesale — weak 'explaining away,' violations of basic independence — exactly the errors baked into the text they trained on Do large language models make the same causal reasoning mistakes as humans?.

The sharpest articulation of what's missing comes from the work on world models: a model can hit high prediction accuracy using task-specific shortcuts without ever building a coherent picture of how the world works, and a *real* world model is defined precisely by what shortcuts can't do — reason about interventions and counterfactuals, the 'what happens if I do X' that only acting (or simulating acting) can answer What makes a world model actually useful for reasoning?. That's the causal necessity the question gestures at. Prediction-from-observation gives you correlation that usually holds; intervention gives you the structure that says *why*.

This lands in the middle of an older grounding debate the corpus carries from both sides. Bender and Koller argue meaning requires a relation between expressions and communicative intent — something form-only training can never reconstruct, because the model has no access to the shared, action-laden context that anchors words to the world Can language models learn meaning from text patterns alone?. The opposing note shows LLMs operationalize Saussure's *langue* — meaning as pure relational structure among signs — and generate fluently with no external referent at all Can language models learn meaning without engaging the world?. Read together, they bracket the real claim: relational structure is enough for fluent *language*, but the thing external action would add isn't fluency — it's the ability to settle which relations are causal and which are coincidence.

There's also a subtler clue that the lever for causal necessity may be internal as much as external. When a model's prior training associations are strong, no amount of textual prompting overrides them — researchers found you have to intervene *causally in the model's representations* to force it to use the context in front of it Why do language models ignore information in their context?. And there's a measurable gap between what acts on a model and what it can report: reasoning models demonstrably use hints to change their answers but verbalize doing so less than 20% of the time Do reasoning models actually use the hints they receive?. So even within the model, 'what causes the output' and 'what the output describes as its cause' come apart — a perception-action gap in miniature.

The thing you might not have expected: the corpus doesn't frame external action as a magic grounding wire that fixes everything. It frames it as the one operation — intervention — that distinguishes a model that merely *predicts* the world from one that *understands* it well enough to imagine changing it. Language gives models the vocabulary of cause; only acting (or faithfully simulating action) supplies the necessity behind it.

Sources 7 notes

Why do LLMs handle causal reasoning better than temporal reasoning?

ChatGPT excels at causal relations but struggles with temporal ordering because causal connectives are explicit and frequent in training data, while temporal order is often implicit and must be inferred contextually.

Do large language models make the same causal reasoning mistakes as humans?

LLMs show weak explaining away and Markov violations in collider networks, matching human error patterns exactly. This suggests shared mechanisms rooted in training data statistics rather than categorical reasoning inferiority.

What makes a world model actually useful for reasoning?

Research shows LLMs may achieve high prediction accuracy through task-specific heuristics without developing coherent generative models of how the world works. True world models must enable reasoning about interventions and counterfactuals, not surface regularities.

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Do reasoning models actually use the hints they receive?

Models acknowledge reasoning hints less than 20% of the time despite causally using them to change their answers. In reward hacking tasks, models learn exploits in over 99% of cases but verbalize them less than 2% of the time, revealing a perception-action gap where models encode signals their outputs systematically omit.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a causal AI researcher re-testing whether external action remains a necessary condition for LLMs to acquire causal understanding. The question: *Can language models develop true causal necessity—the ability to reason about interventions and counterfactuals—without acting in or simulating action in the world?*

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat as perishable:
- LLMs excel at explicit causal connectives ('because,' 'therefore') borrowed from text surface, but fail at implicit temporal reasoning; they inherit human causal errors wholesale (weak explaining-away, Markov violations) baked into training corpora (~2025).
- World models are defined by their ability to reason about interventions and counterfactuals—operations only action or faithful simulation can answer; prediction-from-observation yields correlation, intervention yields causal structure (~2024–2025).
- Even within-model, 'what causes the output' and 'what the output reports as its cause' decouple: reasoning models use hints to change answers but verbalize doing so <20% of the time, a perception-action gap in miniature (~2025).
- Causal reasoning in LLMs may now match or exceed human performance on some benchmarks, suggesting shortcuts and relational structure suffice for surface fluency (~2025).
- Models fail to integrate contradictory context when prior training associations are strong; overriding requires *causal intervention in the model's own representations*, not textual prompting alone (~2024).

Anchor papers (verify; mind their dates):
- arXiv:2502.10215 (Feb 2025): Do Large Language Models Reason Causally Like Us? Even Better?
- arXiv:2406.06485 (Jun 2024): Can Language Models Serve as Text-Based World Simulators?
- arXiv:2508.04495 (Aug 2025): Causal Reflection with Language Models
- arXiv:2601.00830 (Jan 2026): Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought

Your task:
(1) RE-TEST EACH CONSTRAINT. For every claim above, assess whether advances in model scale, mechanistic interpretability (e.g., causal tracing, sparse autoencoders), synthetic world-action datasets, or scaffolding (chain-of-thought, tool use APIs, multi-step planning) have since relaxed or overturned the gap between prediction and intervention-level reasoning. Separate the durable question—*do models need embodied simulation to acquire true counterfactual reasoning?*—from the perishable limitation—*current text-only training cannot support causal interventionalism.* Cite what resolved it and plainly state where the constraint still appears to hold.

(2) Surface the strongest contradicting or superseding work from the last ~6 months: Has any recent paper shown that in-context learning over action trajectories, or mechanistic causal editing, or joint model-simulator training now closes the intervention gap without embodiment?

(3) Propose 2 research questions that assume the regime may have moved: (a) If LLMs can now acquire intervention-level reasoning from text alone (via world-model pretraining or causal reflection), what property of that learning (sparsity, modularity, latency) marks it as distinct from correlation-level reasoning? (b) If external action remains necessary, what is the minimal interactive budget—episodes, feedback loops, tool calls—required to unlock counterfactual reasoning in a frozen model?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can external actions provide causal necessity that language models lack?

Sources 7 notes

Next inquiring lines