Can Kolmogorov complexity alone capture what makes intelligence general?
This explores whether a single information-theoretic yardstick like Kolmogorov complexity — shortest-program length — can explain general intelligence, or whether generality is about something that measure structurally can't see.
This explores whether Kolmogorov complexity, the classic "shortest program that produces the data" measure, is enough to capture what makes intelligence general — and the corpus's answer is a fairly pointed no, for a reason that's more interesting than the usual complaints. The sharpest critique is that Kolmogorov (and Shannon) measures assume an observer with unlimited compute: they ask how compressible something is in principle, by an idealized machine with infinite time. But real learners — humans, trained models — are *bounded*. That gap isn't a footnote; it's the whole story. Classical complexity literally cannot value data the way a bounded learner does, which is why feature engineering helps, why curriculum order matters, and why a trained model can end up knowing more than the process that generated its data Why do Shannon and Kolmogorov measures fail to value data?. Generality, on this view, lives precisely in the territory Kolmogorov complexity assumes away.
The corpus even offers a candidate replacement that makes the point concrete. Epiplexity tries to measure the *structural* information a computationally bounded observer can actually extract — separating learnable regularity from raw entropy you'd need infinite time to crack. And notably, this resource-aware measure correlates with out-of-distribution generalization and predicts which datasets enable broad transfer What can a bounded observer actually learn from data?. That's the crux: generality is about transfer to the unfamiliar, and the measure that tracks it is the one built around limited resources, not the one built around an omniscient compressor.
There's a second angle the corpus surfaces that a complexity-only view would miss entirely: two systems can be equally "complex" by any output-based measure and yet be radically different inside. Models trained by SGD can carry all the linearly decodable features a task needs while their internal organization is fractured and entangled — identical performance, incoherent structure underneath, invisible to standard metrics Can models be smart without organized internal structure? Can AI pass every test while understanding nothing?. A description-length number scores the output, not whether the representation is the kind of clean, reusable structure that generalizes. So a system can be cheap to describe and still brittle, or pass every test and understand nothing.
This connects to why reasoning models actually break. They don't fail at a complexity threshold — long problems aren't inherently the problem. They fail at *novelty*: when an instance is unlike anything in training, because the model fitted instance-level patterns rather than a generalizable algorithm Do language models fail at reasoning due to complexity or novelty?. If generality were a function of complexity, harder-to-describe problems would be the failure points. Instead it's unfamiliarity — exactly what a bounded, transfer-oriented account predicts and a complexity-only account can't.
The through-line worth taking away: a single scalar of compressibility flattens distinctions that matter for generality — what a *bounded* learner can extract, whether internal structure is coherent or fractured, and whether the system transfers to the novel rather than memorizing the familiar. Even the field's favorite trade-offs sometimes turn out to be artifacts of how you measure rather than real constraints Is the exploration-exploitation trade-off actually fundamental? — a useful reminder that picking the wrong yardstick doesn't just miss things, it invents phenomena. Kolmogorov complexity is a beautiful idealization; generality seems to live in everything that idealization throws away.
Sources 6 notes
Both measures assume observers with unlimited compute and miss learnable, useful information. The gap explains why feature engineering helps, curriculum order matters, and trained models exceed their generating process—empirical facts classical theory cannot account for.
Epiplexity formalizes the structural information a computationally bounded observer can extract from data, separating learnable regularity from time-bounded entropy. This task-free measure correlates with out-of-distribution generalization and explains why some datasets enable broader transfer than others.
Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.
The Fractured Entangled Representation hypothesis shows that SGD-trained networks can produce identical outputs across all inputs while maintaining radically different internal representations. Standard benchmarks cannot detect this structural difference.
LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.
Hidden-state analysis using Effective Rank metrics shows near-zero correlation between exploration and exploitation, revealing the trade-off emerges only at token level. VERL demonstrates simultaneous enhancement achieving 21.4% accuracy gains on Gaokao 2024.