Can goal information injected at inference time replace goal-conditioned training?

This explores whether handing a model its goal in the prompt at run time (context, instructions, future hints) can substitute for baking goal-direction into the weights through training — and the corpus suggests the answer is 'partly, and only under specific conditions.'

This explores whether you can just tell a model its goal at inference time instead of training it to pursue goals — and the corpus draws a fairly sharp line between activating something the model already has versus installing something it doesn't. The recurring lesson is that injected information is a switch, not a teacher: it can turn on latent capacity, but it can't manufacture capacity that isn't there. Prompt optimization, for instance, works entirely inside the model's existing training distribution and hits a hard ceiling when foundational knowledge is missing — it reorganizes what exists rather than adding anything new Can prompt optimization teach models knowledge they lack?. So if 'goal information at inference time' means surfacing a goal the model already knows how to pursue, injection can work; if it means teaching a new way to plan toward goals, it likely won't.

There's also a quieter failure mode that undercuts naive injection: models frequently ignore the goal you give them. When parametric priors from training are strong, in-context instructions get overridden, and textual prompting alone can't force the model to honor the new information — only intervening in the model's internal representations does Why do language models ignore information in their context?. That's a direct strike against the 'just inject it' hypothesis: the channel you're injecting through is the same one that loses to training-time associations. It's a reason goal-conditioned training tends to be more reliable — the goal is encoded where it can't be drowned out.

The most interesting comparison is TRELAWNEY, which sits exactly on the seam between the two approaches. Instead of changing architecture or adding inference-time machinery, it bakes goal/future information into the training *data* via lookahead tokens, so the model learns goal-conditioned generation through standard infrastructure Can embedding future information in training data improve planning?. That's a vote for training-side conditioning — but a cheap, data-only version of it — implying the dichotomy in the question is softer than it looks. You don't necessarily choose between 'retrain the whole objective' and 'prompt at runtime'; there's a middle path of lightweight training-data conditioning.

Where inference-time methods genuinely substitute for training is in behavior-shaping rather than capability-building. Proxy-tuning shifts a model's distribution at decoding time and closes most of the alignment gap while leaving base weights untouched — and actually preserves knowledge better than direct fine-tuning, which corrupts lower-layer storage Can decoding-time tuning preserve knowledge better than weight fine-tuning?. Reflexion goes further, letting agents improve across episodes by storing verbal self-critiques in episodic memory with no parameter updates at all Can agents learn from failure without updating their weights?. Both show that steering and even iterative goal-pursuit can live outside the weights — provided the underlying ability is already present.

Which points to the real answer hiding under the question: it depends on whether goals require *capability* or just *elicitation*. Several lines of evidence find that base models already contain latent reasoning that minimal training merely selects rather than creates Do base models already contain hidden reasoning ability?. But when the gap is structural — a reasoning protocol that makes extra tokens productive — no amount of inference-time budget closes it; reasoning models beat non-reasoning ones regardless of how much you spend at runtime Can non-reasoning models catch up with more compute?. So inference-time goal injection can replace goal-conditioned training precisely to the degree the goal-pursuit machinery is already trained in. It's a great activation key and a poor locksmith.

Sources 7 notes

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can embedding future information in training data improve planning?

TRELAWNEY augments training data with special tokens encapsulating future information, allowing models to learn goal-conditioned generation using standard infrastructure. Results show improved planning, algorithmic reasoning, and story generation without modifying architecture or training procedures.

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Can non-reasoning models catch up with more compute?

Reasoning models persistently outperform non-reasoning models regardless of inference budget because training instills a reasoning protocol that makes additional tokens productive. The gap is fundamentally about deployment mechanisms and training structure, not raw capability.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst testing whether inference-time goal injection can replace goal-conditioned training in LLMs. The question remains open despite recent progress.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2025; treat these as perishable constraints:
• Prompt optimization activates latent capacity but cannot manufacture new knowledge; it reorganizes what exists within the model's training distribution (2025).
• Models frequently ignore injected goal information when parametric priors from training are strong; textual prompting alone cannot override in-context instructions — only internal representation intervention succeeds (2024–2025).
• TRELAWNEY shows a middle path: baking goal/future information into training data via lookahead tokens enables goal-conditioned generation without architectural change, suggesting the dichotomy is softer than stated (2024).
• Proxy-tuning shifts model distribution at decoding time, preserves knowledge better than direct fine-tuning, and closes alignment gaps without parameter updates (2025).
• Base models already contain latent reasoning capability; structural gaps (e.g., reasoning protocols that make extra tokens productive) cannot be closed by inference-time budget alone—reasoning models outperform non-reasoning ones regardless of test-time compute (2025).

Anchor papers (verify; mind their dates):
• arXiv:2502.10708 (Domain knowledge injection survey, 2025)
• arXiv:2504.09858 (Reasoning without thinking, 2025)
• arXiv:2510.01265 (RL as pretraining, 2025)
• arXiv:2409.15360 (Reward-robust RLHF, 2024)

Your task:
(1) RE-TEST EACH CONSTRAINT. For latent-activation claims: has scaling, in-context learning, or multi-modal grounding since relaxed the "no new knowledge" ceiling? For goal-ignore failure modes: have steering methods (constitutional AI, representational editing, adapter-style interventions) overcome training-prior dominance? For reasoning gaps: has test-time scaling or new architectural bridging closed the structural gap? Separate durable (likely: training-time structure is cheaper and more robust) from perishable (likely: inference-time steerability margins).
(2) Surface the strongest contradicting work from the last 6 months—especially any claiming inference-time goal injection *does* rival or exceed training-time conditioning on realistic benchmarks.
(3) Propose 2 research questions assuming the regime has shifted: (a) Can memory-augmented or multi-agent orchestration (episodic, semantic, or retrieval-coupled buffers) substitute for goal-conditioned weights? (b) Do emergent in-context learners (few-shot, chain-of-thought, tool-use) narrow the elicitation–capability gap enough to flip the dichotomy?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can goal information injected at inference time replace goal-conditioned training?

Sources 7 notes

Next inquiring lines