Do models cache intentions about response topics before generating the first token?

This explores whether a model has already 'decided' where its answer is heading — its topic or intent — inside its hidden activations before the first visible token appears, rather than figuring it out word-by-word as it writes.

This explores whether a model has already 'decided' where its answer is heading before the first visible token appears. The corpus doesn't have a paper that literally measures 'cached intent,' but several notes circle the same territory from different angles, and together they suggest the answer is a qualified yes — with an important caveat about what kind of 'decision' it really is.

The strongest evidence that something is computed ahead of output comes from work showing models do real reasoning in their early layers and only later convert it to surface tokens. Logit-lens analysis finds that models trained with hidden chain-of-thought compute the correct answer in layers 1–3, then actively suppress it to emit format-compliant filler — the reasoning is fully present internally before any meaningful token is produced Do transformers hide reasoning before producing filler tokens?. In the same spirit, latent-reasoning architectures scale 'thinking' entirely through hidden-state iteration without ever verbalizing it, implying that verbalization is a training artifact layered on top of computation that already happened Can models reason without generating visible thinking tokens?. Diffusion LLMs make the timing visible directly: answer confidence converges early while the surrounding reasoning is still being refined, which is close to a literal demonstration of 'destination locked before the work is shown' Can reasoning and answers be generated separately in language models?.

But here's the twist that reframes the whole question. A 'cached intention' implies a single committed plan, and another line of work says that's not what's sitting in the hidden state. Shanahan's 20-questions regeneration test shows models hold a *superposition* of possible characters or answers and sample from that distribution at generation time — regenerate the same prompt and you get different, each-internally-consistent outputs, proving no fixed commitment exists Do large language models actually commit to a single character?. So what's pre-loaded may be less a chosen topic than a probability landscape over topics, collapsed into one path only as tokens are sampled.

This fits the deeper picture of what transformer hidden state even is. The residual stream transmits knowledge as continuous *flow*, not retrievable *storage* — knowledge exists in the performance, not in an archive you could call up and inspect Do transformer models store knowledge or generate it continuously?. Generation itself is a smooth probabilistic drift toward the training distribution rather than an exploration of alternatives Does LLM generation explore competing claims while producing text?. That framing makes 'caching' the wrong metaphor: there isn't a stored intent so much as a directional momentum that the first tokens reveal rather than create.

Worth knowing if you want to go further: the pivot points where that momentum actually gets set appear to be sparse. A small minority of high-entropy 'forking' tokens carry most of the steering signal Do high-entropy tokens drive reasoning model improvements?, and specific reflection tokens like 'Wait' and 'Therefore' spike in mutual information with the correct answer Do reflection tokens carry more information about correct answers?. So the model's 'intent' may be less a thing fixed before token one and more a thing repeatedly re-committed at a handful of decisive moments along the way.

Sources 8 notes

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Can reasoning and answers be generated separately in language models?

ICE shows that bidirectional attention in diffusion LLMs enables in-place prompting—embedding reasoning directly in masked positions refined alongside answers. Answer confidence converges early while reasoning continues refining, allowing early-exit mechanisms to cut compute by 50% while maintaining accuracy.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Do transformer models store knowledge or generate it continuously?

Transformers organize knowledge as flowing activations rather than retrievable archives, mirroring oral cultures where knowledge exists only in performance. This explains why model knowledge is contextual, difficult to edit, and inseparable from generation.

Does LLM generation explore competing claims while producing text?

Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.

Do high-entropy tokens drive reasoning model improvements?

Only ~20% of tokens exhibit high entropy as pivotal reasoning decision points; RLVR primarily adjusts these forking tokens. Training exclusively on them matches or exceeds full-gradient performance, revealing that the minority carries the learning signal.

Do reflection tokens carry more information about correct answers?

Specific tokens like "Wait" and "Therefore" show sharp spikes in mutual information with correct answers. Suppressing them harms reasoning while suppressing equal random tokens does not, and representation recycling improves accuracy 20%.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing whether transformer LLMs pre-compute response intentions before generating the first token. This question remains open despite recent work.

What a curated library found — and when (dated claims, not current truth):
Findings span 2024–2026; treat as perishable constraints:

• Hidden chain-of-thought models compute correct answers in layers 1–3 before emitting format-compliant filler; reasoning is fully present internally before meaningful tokens are produced (2024–12).
• Latent-reasoning architectures scale thinking entirely through hidden-state iteration without verbalizing, suggesting verbalization is a training artifact layered on prior computation (2025–02).
• Diffusion LLMs show answer confidence converges early while surrounding reasoning is still refined — a direct signal of 'destination locked before work is shown' (2025–08).
• But the 20-questions regeneration test falsifies any committed-plan view: regenerating the same prompt yields different, each-internally-consistent outputs, proving no fixed commitment exists — instead, a superposition of possible answers collapses at sample time (2025–06).
• High-entropy minority tokens carry most steering signal; specific reflection tokens ('Wait', 'Therefore') spike in mutual information with correct answers, suggesting intent is repeatedly re-committed at sparse decisive moments, not fixed before token one (2025–06, 2025–06).

Anchor papers (verify; mind their dates):
- arXiv:2412.04537 (Understanding Hidden Computations in Chain-of-Thought Reasoning, 2024–12)
- arXiv:2502.05171 (Scaling up Test-Time Compute with Latent Reasoning, 2025–02)
- arXiv:2506.01939 (Beyond the 80/20 Rule: High-Entropy Minority Tokens, 2025–06)
- arXiv:2508.10736 (Thinking Inside the Mask: In-Place Prompting in Diffusion LLMs, 2025–08)

Your task:
(1) RE-TEST THE SUPERPOSITION vs. COMMITMENT TENSION. The library's strongest finding is that hidden state holds a probability landscape *over* topics, not a locked intent. Has newer work on model internals, decoding strategies (beam search, nucleus sampling refinements), or controlled generation (prefix tuning, LoRA steering) since shown ways to either collapse superposition *earlier* (sharpening commitment pre-token-1) or prove superposition persists even after token 1? Cite what resolves or confirms this constraint.
(2) Surface the strongest recent work (last 6 months) that contradicts the 'superposition' finding — any paper arguing for a single committed hidden-state intent, or any empirical result showing hard commitment before generation starts.
(3) Propose 2 research questions that assume the regime may have moved: (a) If high-entropy minority tokens truly carry steering, can we engineer synthetic 'intent tokens' to pre-sharpen the superposition before generation, and does this improve consistency without hallucination cost? (b) Do scaling laws or architectural changes (e.g., Mamba, State-Space Models) alter when intent crystallizes, and does the sparse-token hypothesis still hold?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Do models cache intentions about response topics before generating the first token?

Sources 8 notes

Next inquiring lines