Can language models learn internal world models without explicit environment specifications?
This explores whether LLMs can build genuine internal models of how the world works purely from text, without ever being handed the rules of an environment or touching it directly.
This explores whether LLMs can build genuine internal models of how the world works purely from text — without being handed environment rules or touching the world directly. The corpus says yes, but with an asterisk worth understanding. The cleanest answer is that LLMs form what one line calls *indirect causal grounding*: they extract structured regularities from text that was itself produced by causally grounded humans, so the world model arrives secondhand, mediated through language rather than built from direct contact Can large language models develop genuine world models without direct environmental contact?. The grounding is real, but the chain has gaps — the model can't verify or update against the world in real time, which is exactly where text-only world models get brittle.
The deeper question is whether what's learned is a *world model* at all, or just a very good predictor wearing one as a costume. Here the corpus draws a sharp line: high prediction accuracy can come from task-specific heuristics that never cohere into a generative model of how things work. A true world model has to support reasoning about interventions and counterfactuals — what *would* happen if you changed something — not merely forecast the next observation What makes a world model actually useful for reasoning?. So 'learning a world model without specifications' is possible, but passing the prediction test doesn't prove you've done it.
Why does this work at all without an environment spec? One striking line argues that LLMs operationalize Saussure's *langue* — meaning emerges from the relational structure among words compressed from text, with no external referents required Can language models learn meaning without engaging the world?. That's the mechanism behind text-only world models: structure in, structure out. But the same relational-compression move is double-edged. The same process that captures how-the-world-works also internalizes how-the-text-skews: low-resource cultures get represented through high-resource proxies as a structural feature of the internal states, not just a surface slip Do LLMs represent low-resource cultures through dominant cultural proxies?. A world model learned from text inherits the text's distortions at the architecture level.
There's also evidence the internal model is real enough to be probed and steered. Sparse autoencoders found models carry causal machinery for tracking whether they actually know a fact about an entity — a kind of internal self-knowledge that drives both hallucination and refusal Do models know what they don't know?. And limited genuine introspection appears when a causal chain links an internal state to an accurate report Can language models actually introspect about their own states?. These suggest the internal representations aren't just outputs — they're structured enough to detect and act on. Yet the same models routinely fail to integrate what's in front of them when training priors are strong, overriding context with parametric knowledge Why do language models ignore information in their context? — a world model confident in its own grooves.
The thing you didn't know you wanted to know: the ceiling here may be formal, not just engineering. Self-improvement in LLMs is bounded by a generation-verification gap — every reliable fix requires something external to validate it, and no amount of metacognition escapes that What stops large language models from improving themselves?. So a model can learn a world model from text without explicit specifications, but it can't fully *correct* that world model from the inside. The absence of an environment spec is what makes text-only world models possible — and also what caps how far they can self-repair.
Sources 8 notes
LLMs form structured world representations by extracting regularities from training data produced by causally grounded humans. This constitutes indirect causal grounding mediated through text, though the chain has gaps that limit real-time verification and model updating.
Research shows LLMs may achieve high prediction accuracy through task-specific heuristics without developing coherent generative models of how the world works. True world models must enable reasoning about interventions and counterfactuals, not surface regularities.
Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.
Mechanistic interpretability analysis reveals that low-resource cultures like Ethiopia and Algeria are structurally represented through high-resource cultural proxies in internal model states, not just output. This architectural bias persists even when models can produce correct surface-level answers.
Sparse autoencoders revealed that language models develop causal mechanisms for detecting whether they know facts about entities. These mechanisms actively steer both hallucination and refusal behavior, and persist from base models into finetuned chat versions.
LLM self-reports usually reflect human training distributions rather than actual internal processes. However, when a causal chain connects an internal state to accurate reporting—like inferring low temperature from output consistency—genuine lightweight introspection occurs without requiring consciousness.
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.
Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.