What makes some contexts learnable as rules versus requiring model retraining?

This explores the dividing line between knowledge a model can absorb as natural-language rules at inference time (no weight changes) and knowledge that only sticks if you actually retrain the weights.

This is really a question about where the boundary sits between rules you can hand a frozen model and knowledge you have to bake into its weights — and the corpus draws that line surprisingly sharply. The cleanest statement of the ceiling comes from work showing that prompting only reorganizes what a model already knows: it can retrieve and activate latent knowledge, but it cannot inject anything foundational that was absent from training Can prompt optimization teach models knowledge they lack?. So the first answer is almost tautological but load-bearing — a context is learnable as a rule when the underlying capability is already in the model and the rule just tells it which way to point.

When that condition holds, the gains can be real without touching a single weight. Extracting explicit natural-language rules out of messy context into reusable 'skills' lifts a frozen model's reasoning measurably, and those skills even transfer across model backbones Can frozen models learn better by extracting context into skills?. Agents can do something similar by writing verbal self-diagnoses into episodic memory after a clean success/failure signal, improving across attempts with no parameter updates Can agents learn from failure without updating their weights?. And for sequential decision-making, in-context learning works only when the context carries the right structure — full or partial trajectories from the same setting, not isolated examples Why do trajectories matter more than individual examples for in-context learning?. The pattern: rules are learnable in-context when the knowledge exists and the context supplies the right shape to surface it.

The interesting part is the failure side — the conditions that quietly push you toward retraining. Even a perfectly correct rule sitting in the context window gets ignored when the model's training priors are strong enough to override it; fixing this needs intervention in the representations, not better wording Why do language models ignore information in their context?. And when reasoning is decoupled from familiar semantics, models fall back on token associations and the rules in context stop working at all — they reason semantically, not symbolically Do large language models reason symbolically or semantically?. So a context resists being learned as a rule precisely when it contradicts the model's parametric instincts or asks for genuinely symbolic manipulation.

There's a third zone the corpus flags: contexts that can't be resolved inside the system at all. Autonomous test-time learning breaks down on contradictory rules because choosing correctly depends on information outside the model's reach, which is why one system routes those conflicts to a human rather than guessing Can LLMs learn reliably at test time without human oversight?. Where feedback is missing entirely, self-play can manufacture it and co-evolve skills as natural-language edits — but only with a guardrail against collapse Can language models learn skills without human supervision?.

When the knowledge genuinely isn't there, you're into retraining — and the corpus is blunt that this is a different kind of cost. Reinforcement learning that rewards explanation quality internalizes coherent knowledge structures better than plain supervised fine-tuning Can reinforcement learning embed domain knowledge more effectively than supervised fine-tuning?. But every adaptation method has a domain-specific sweet spot and hidden side effects — visible performance gains often come paired with degraded reasoning faithfulness or lost format flexibility How do domain training techniques actually reshape model behavior?. The thing you didn't know you wanted to know: the rule-versus-retrain choice isn't only about whether in-context learning *works* — it's that retraining buys you durability at the price of brittleness elsewhere, so the cheapest method that clears the knowledge bar is usually the right one.

Sources 10 notes

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Can frozen models learn better by extracting context into skills?

Extracting natural-language rules from context into reusable skills improves frozen model reasoning without weight updates. On CL-bench, this lifts GPT-4.1 from 11.1% to 16.5%, with skills transferable across model backbones.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Why do trajectories matter more than individual examples for in-context learning?

In-context learning for sequential decision-making requires full or partial trajectories from the same environment level, not just isolated examples. This structural property—trajectory burstiness—allows models to generalize across vastly different tasks without weight updates.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Can LLMs learn reliably at test time without human oversight?

ARIA demonstrates that LLMs can adapt during inference through three integrated components: structured self-dialogue for uncertainty assessment, timestamped knowledge bases for conflict detection, and human-mediated resolution queries. Autonomous systems fail at reconciling contradictory rules because the correct choice depends on context outside the system.

Can language models learn skills without human supervision?

Ctx2Skill's three-role self-play loop manufactures missing feedback through internal signals: the Challenger escalates difficulty as curriculum, the Judge gives binary verdicts as reward, and both sides evolve via natural-language skill edits. Success requires balancing adversarial pressure against a generalization safeguard to prevent collapse.

Can reinforcement learning embed domain knowledge more effectively than supervised fine-tuning?

RLAG rewards both answer accuracy and explanation rationality by cycling between augmented and unaugmented generation, progressively internalizing coherent knowledge structures. This outperforms SFT because it prioritizes reasoning quality over token-level correctness.

How do domain training techniques actually reshape model behavior?

Research shows every adaptation method—from parameter-efficient tuning to knowledge graph curricula—has optimal conditions tied to specific domains. The key finding: visible benefits like performance gains often come with hidden degradation in reasoning faithfulness, capability transfer, and format flexibility.

What makes some contexts learnable as rules versus requiring model retraining?

Sources 10 notes

Next inquiring lines