Why do structured and creative domains exhibit opposite entropy dynamics?
This explores why pushing a model toward lower uncertainty helps in domains with one right answer but harms domains where many answers are valid — the same entropy reduction that sharpens reasoning is what kills creative diversity.
This question is really about a single mechanism — how training reduces a model's output uncertainty — playing out as a virtue in one setting and a defect in another. The corpus suggests the split isn't about creativity versus logic per se; it's about whether the domain has a verifiable target to converge on. Where there is one, low entropy is the goal. Where there isn't, low entropy is collapse.
In structured, checkable domains, entropy reduction is the engine of improvement — up to a point. Reasoning RL works by sharpening the model around a small set of pivotal decisions: only about 20% of tokens carry high entropy, acting as 'forking points' where the reasoning could branch, and training those tokens alone matches full updates Do high-entropy tokens drive reasoning model improvements?. The model learns to commit harder at the right forks and converge toward the verified answer everywhere else. But the corpus also shows this has a hard ceiling: when policy entropy is driven all the way to zero, performance saturates along a predictable law, because the model has stopped exploring — interventions like Clip-Cov and KL-Cov exist precisely to slow that collapse Does policy entropy collapse limit reasoning performance in RL?. So even in structured domains, you want entropy falling but not gone.
Creative domains exhibit the opposite dynamic because there is no scalar to converge on — the value *is* the spread. When the same convergence pressure is applied, you get 'diversity collapse' in ideation: the model funnels toward a few safe outputs. One note argues this happens because existing reasoning methods only handle conventional, single-answer problem-solving and never address the combinational, exploratory, and transformational modes that creativity actually requires Can LLMs reason creatively beyond conventional problem-solving?. Entropy here isn't noise to be eliminated; it's the search space, and collapsing it forecloses the recombination that creative work depends on Can identical outputs hide broken internal representations?.
The deeper reason the two domains diverge is environmental, not architectural. The same property that makes a domain optimizable — an immediate, well-defined metric to score against — is what makes entropy reduction *safe* there What makes a research domain suitable for autonomous optimization?. Creative domains lack that scalar, so any optimizer's drive to lower uncertainty has nothing legitimate to converge on and instead converges on whatever the training signal happens to reward. There's even a hint the model does this to itself: post-trained models produce 3–4× lower entropy on their own generations, tracking an internal 'this looks familiar' signal that quietly tightens the output distribution without being told to Why do models produce less uncertain outputs on their own text?.
The thing worth carrying away: 'high entropy' and 'low entropy' aren't good or bad in themselves. The same dynamic that researchers fight to *preserve* in creative generation, they fight to *spend down* in verifiable reasoning — and the dividing line is simply whether the domain can tell you when you're right.
Sources 6 notes
Only ~20% of tokens exhibit high entropy as pivotal reasoning decision points; RLVR primarily adjusts these forking tokens. Training exclusively on them matches or exceeds full-gradient performance, revealing that the minority carries the learning signal.
Empirical law R = -a·exp(H) + b shows performance saturates when policy entropy approaches zero. Interventions like Clip-Cov, KL-Cov, and GPPO preserve exploratory capacity by managing entropy reduction during training.
Research identifies combinational, exploratory, and transformational reasoning as distinct creative modes grounded in cognitive science. Existing LLM reasoning methods address only conventional problem-solving, leaving creative paradigms unaddressed and potentially explaining diversity collapse in ideation.
Networks trained with SGD reproduce outputs perfectly while having radically different internal structure than evolved networks, with weight perturbations revealing fractured, entangled representations that prevent transfer to novel contexts or creative recombination.
Autonomous research pipelines require immediate scalar metrics, modular architecture, fast iteration cycles, and version control. Domains lacking any property resist autoresearch regardless of LLM capability, because the bottleneck is environmental structure, not model power.
Post-trained models produce 3-4x lower output entropy on their own generations, driven by an internal representation of input surprise that causally modulates confidence. This implicit self-recognition signal appears without being verbalized, encoded directly in the output distribution.