Can deterministic computation actually create new information in data?

This explores whether a fixed, repeatable computation can generate genuinely new information — or whether it only rearranges what was already present in the inputs, the model, or the observer.

This reads the question as a deep one: not 'can a program produce new output' (obviously yes) but 'can deterministic computation create information that wasn't already latent in its inputs?' The corpus leans toward no — but in a way that's more interesting than a flat denial, because it relocates where 'new information' actually comes from.

The sharpest anchor is the argument that computation presupposes an experiencing mapmaker Can computation arise without a conscious mapmaker?. Its claim is that no amount of added algorithmic complexity can manufacture meaning; a conscious agent who carves continuous physics into discrete symbols has to logically precede the computation. On that view, deterministic computation never originates information — it only transforms symbols whose significance was assigned beforehand. A complementary, more formal version of this shows up in epiplexity What can a bounded observer actually learn from data?, which defines 'information' as the structure a computationally bounded observer can actually extract from data. Information here isn't a property of the bits alone — it's relative to who's reading them. That reframes the question: deterministic computation can surface structure an observer couldn't otherwise reach, even if it adds no entropy.

The determinism-specific evidence is blunt. Zero-temperature, fixed-seed LLM runs Does setting temperature to zero actually make LLM outputs reliable? reproduce the same output every time — but that output is still just one draw from a probability distribution. Determinism buys you repeatability, not new content; consistency is not the same as having learned or generated anything additional. And when researchers add randomness to recursive reasoners, naive stochasticity yields nothing either Does adding randomness to recursive models actually help reasoning?. The gains from stochastic latent reasoning Can stochastic latent reasoning help models explore multiple solutions? come not from injecting noise but from a variational framework that learns where to branch. So neither pure determinism nor pure randomness is the source — structure is.

The most provocative counterpoint comes from systems that look like they create information from nothing. Asymmetric self-play Can language models improve themselves without any external training data? lets a model improve with no external data at all, a proposer and solver bootstrapping a curriculum between them. Seedless and instance-seed synthetic data methods Can we generate synthetic data without any seed examples? Can synthetic data replace seed examples in task generation? generate training data with no seed examples. These feel like information creation — but read against the corpus, they're better understood as redistribution: the model's existing latent knowledge is recombined, decomposed taxonomically, or pressure-tested into explicit form. No new entropy enters; what changes is which structure becomes extractable, exactly the epiplexity move. The same theme runs through prompting being Turing-complete Can a single transformer become universally programmable through prompts?: a fixed transformer can compute any computable function, but the function's information was always specifiable — the prompt selects it, it doesn't invent it.

So the synthesis: deterministic computation doesn't create information in the strong sense — it doesn't add entropy or originate meaning. What it does, and what most of these systems exploit, is make latent structure newly legible to a bounded observer. The interesting frontier isn't 'creation from nothing' but reuse and reconstruction — see memory-amortized inference Can cognition work by reusing memory instead of recomputing?, which casts intelligence itself as navigating and reusing prior inference rather than recomputing. The thing you might not have known you wanted to know: the deepest accounts here don't locate novelty in the computation at all, but in the observer who can finally read what was always there.

Sources 10 notes

Can computation arise without a conscious mapmaker?

Computational systems depend on a conscious mapmaker who alphabetizes continuous physics into discrete symbols. No increase in algorithmic complexity can generate this agent; it must logically precede the computation it makes possible.

What can a bounded observer actually learn from data?

Epiplexity formalizes the structural information a computationally bounded observer can extract from data, separating learnable regularity from time-bounded entropy. This task-free measure correlates with out-of-distribution generalization and explains why some datasets enable broader transfer than others.

Does setting temperature to zero actually make LLM outputs reliable?

Fixed seeds and zero temperature replicate the same output repeatedly, but that output remains one draw from the model's probability distribution. McDonald's omega testing across 100 repetitions reveals that consistency does not equal reliability.

Does adding randomness to recursive models actually help reasoning?

GRAM's ablations show naive stochasticity added to existing recursive models yields no improvement. Gains come specifically from amortized variational inference, which couples sampling to a principled generative objective and learns where to branch rather than injecting undirected noise.

Can stochastic latent reasoning help models explore multiple solutions?

GRAM replaces deterministic latent updates with stochastic sampling, enabling models to represent distributions over solutions rather than single predictions. This allows handling of ambiguous problems and multiple valid strategies that deterministic designs cannot represent.

Can language models improve themselves without any external training data?

SQLM uses a proposer-solver framework where the proposer generates calibrated problems and the solver learns via majority-vote verification. Both agents improve through RL alone, creating an automatic curriculum that scales without human labels or ground-truth answers.

Can we generate synthetic data without any seed examples?

Simula separates global coverage from local diversity, using taxonomy construction for coverage and agentic refinement for complexity. This architecture makes all three desiderata—quality, diversity, complexity—controllable simultaneously without requiring seed data.

Can synthetic data replace seed examples in task generation?

TarGEN generates synthetic data using atomic task elements (instance seeds) instead of full input-output examples, achieving 1-3 point improvements on SuperGLUE tasks. The approach works by constraining label generation after seeding inputs, enabling data creation for domains with no prior examples.

Can a single transformer become universally programmable through prompts?

Research proves a single finite-size transformer exists that can compute any computable function given the right prompt, achieving complexity bounds nearly matching unbounded models. However, standard training rarely produces models that learn to implement arbitrary programs this way.

Can cognition work by reusing memory instead of recomputing?

Memory-Amortized Inference proposes intelligence arises from structured reuse of prior inference paths over topological memory, inverting RL's reward-forward logic into cause-backward reconstruction. This duality explains energy efficiency and suggests memory trajectories form the substrate of adaptive thought.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an analyst re-testing claims about whether deterministic computation can create new information in data. The question remains open: does computation originate information, or only redistribute latent structure?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026. A curated library synthesized:
• Deterministic systems (fixed-seed LLM runs, zero-temperature sampling) reproduce identical outputs but add no entropy; repeatability ≠ new content (~2024).
• Information is observer-relative, not intrinsic to bits alone; computation surfaces structure a bounded agent couldn't otherwise extract, without creating entropy (epiplexity framework, ~2026).
• 'Creation from nothing' systems (asymmetric self-play, seedless synthetic data) actually redistribute existing latent knowledge; no new entropy enters (~2025–2026).
• Prompting is Turing-complete: a fixed transformer can compute any computable function, but the function's information was always specifiable — the prompt selects, not invents (~2024).
• Intelligence itself may be navigation and reuse of prior inference rather than novel computation (memory-amortized inference, ~2025).

Anchor papers (verify; mind their dates):
• arXiv:2411.01992 (2024-11): Turing completeness of prompting
• arXiv:2601.03220 (2026-01): Epiplexity and computationally bounded information
• arXiv:2508.14143 (2025-08): Memory-amortized inference as foundation for cognition
• arXiv:2605.19376 (2026-05): Generative recursive reasoning

Your task:
(1) RE-TEST EACH CONSTRAINT. For the 'no entropy creation' claim, 'observer-relativity' of information, and 'selection vs. invention' in prompting: does post-2026 work on emergent reasoning, in-context learning, or test-time scaling undermine or sharpen these limits? Cite what resolves or re-anchors each constraint.
(2) Surface the strongest contradicting work: does recent research on implicit model learning, latent inference amortization, or cross-domain transfer claim to originate information rather than redistribute it? Name the paper and its mechanism.
(3) Propose 2 research questions that assume the regime may have shifted—e.g., 'Can test-time compute (vs. parameters) genuinely introduce new structure?' or 'Is observer-relativity itself computable?'

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can deterministic computation actually create new information in data?

Sources 10 notes

Next inquiring lines