How tight should a textual learning rate be before it prevents skill escape?
This explores a question the corpus answers obliquely: when you optimize a skill in *text* rather than weights — editing a document the way you'd nudge parameters — how cautious must each edit be to keep the model from drifting away from skills it already had?
This reads the phrase 'textual learning rate' as the aggressiveness of edits when you optimize skills in language rather than in weights, and 'skill escape' as the drift or collapse where chasing a new gain quietly erodes what the model could already do. The corpus doesn't use these exact words, but it has a surprisingly direct answer hiding in two places. The clearest is SkillOpt, which treats a skill document like a set of weights and runs a separate optimizer that proposes edits — but accepts an edit *only* when it strictly improves a held-out validation score Can skill documents be optimized like neural network weights?. That's the answer to 'how tight': the validation gate *is* the learning rate. Edits are unbounded in ambition but the acceptance test is ruthless, so the effective step size is whatever survives a held-out check. Skill escape is prevented not by taking small steps but by rejecting steps that don't generalize.
The self-play side of the corpus shows what happens when you remove that brake. Ctx2Skill co-evolves skills through natural-language edits with no human supervision, and its authors are explicit that the whole loop only works when adversarial pressure is balanced against a 'generalization safeguard' — without it, the system collapses Can language models learn skills without human supervision?. So both text-space methods independently converge on the same shape: the danger isn't the size of any single edit, it's edits that optimize the training signal while quietly abandoning the broader skill. The tightness you need is exactly enough to catch that.
What makes this interesting is that the weight-space literature reaches the *same* conclusion through a completely different door, which suggests it's a property of learning, not of the medium. Staying close to the base model — low KL drift — preserves plasticity, the ability to keep learning later tasks; parameter-only RL that drifts hard stalls when the domain shifts Does staying close to the base model preserve learning ability?. 'KL drift from base' is the weight-space twin of your 'textual learning rate': how far you let yourself move from where you started. The skill-escape failure mode there is catastrophic, dressed as progress.
Then the corpus delivers the genuinely counterintuitive part — tighter is not always the answer, because the *floor* on a useful step is shockingly low. In RLVR, a single training example lifts math accuracy from 36% to 73.6%, and test accuracy keeps climbing for 1,400 steps after training accuracy already hit 100% Can a single training example unlock mathematical reasoning?. The lesson for textual optimization: a good edit doesn't *teach* a skill so much as *activate* a latent one, which means you can afford a very tight learning rate and still get large gains. You don't need aggressive edits to make progress — which is precisely why you can afford the strict validation gate that prevents escape.
So the synthesis: there's no single tightness number, because the corpus reframes the question. The right brake isn't a smaller step, it's a held-out acceptance test Can skill documents be optimized like neural network weights? plus a generalization safeguard Can language models learn skills without human supervision? — the text-space versions of staying near base Does staying close to the base model preserve learning ability? — and you can keep that brake tight precisely because activation, not aggression, is what makes edits pay off Can a single training example unlock mathematical reasoning?. If you want one more thread to pull, the context-integration work shows the failure from the other side: strong prior associations can simply override a new instruction, so a 'too-gentle' textual edit may not move the model at all Why do language models ignore information in their context?.
Sources 5 notes
SkillOpt demonstrates that skill documents can be systematically improved through a separate optimizer that proposes edits, accepting only changes that strictly improve held-out validation scores. This approach outperforms baselines across 52 experimental cells and produces skills that transfer between models.
Ctx2Skill's three-role self-play loop manufactures missing feedback through internal signals: the Challenger escalates difficulty as curriculum, the Judge gives binary verdicts as reward, and both sides evolve via natural-language skill edits. Success requires balancing adversarial pressure against a generalization safeguard to prevent collapse.
FST-trained models stay up to 70% closer to their base distribution than parameter-only RL, and this reduced drift preserves the model's ability to learn subsequent tasks effectively. Parameter-only approaches stall when task domains change, while low KL drift enables sustained adaptation.
A single example in RLVR boosts math performance from 36% to 73.6% and enables test accuracy to improve for 1,400 steps after training accuracy reaches 100%, revealing that minimal activation signals unlock latent reasoning capability.
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.