Does self-generated training data reduce a model's capability diversity?
This explores whether feeding a model its own outputs as training data narrows the range of things it can do — collapsing the spread of formats, solutions, and styles it once had — even when that self-generated data improves accuracy.
This explores the tension between two things the corpus treats as separate but that turn out to be linked: self-generated training data can make a model *better* while quietly making it *narrower*. The cleanest evidence for the upside is SEAL, where models learned knowledge more effectively from data they generated themselves than from data produced by a stronger external teacher — QA accuracy jumped from 33.5% to 47.0% Does self-generated training data improve model learning?. The intuition is that a model restructures information into a shape that fits its own representations, which a teacher can't do for it Does teacher-refined data always improve student model performance?. So self-generated data isn't a degraded substitute — it's sometimes the better fuel.
But 'better at the target' and 'diverse' aren't the same axis, and that's where the corpus gets interesting. Training a model on a distribution it already favors tends to amplify that favorite and suppress the alternatives. RL post-training, for example, locks onto a single dominant format inherited from pretraining within the first epoch and collapses the others — and the winning format is chosen by model scale, not by which one performs best Does RL training collapse format diversity in pretrained models?. Post-training also closes a feedback loop where the model starts treating its own outputs as its next inputs, which shows up as 3–4x lower output entropy Do models recognize their own outputs as actions shaping future inputs?. Lower entropy is exactly what 'reduced capability diversity' looks like from the outside: the model keeps reaching for the same moves.
The sharpest version of the worry is the 'Artificial Hivemind.' Across 70+ models and 26K open-ended prompts, different models independently converged on strikingly similar — sometimes identical — responses, because they share overlapping training data and alignment recipes Do different AI models actually produce diverse outputs?. If self-generated data becomes a bigger share of what models train on, this is the failure mode that compounds: a model recycling its own most-probable outputs has no outside source of variety to pull it back toward the tails.
The corpus doesn't treat this as inevitable, though — and that's the part worth knowing. Whether self-training narrows you depends on what the training rewards and whether something interrupts the convergence. Preference tuning *reduces* lexical diversity in code (where there's one right answer) but *increases* it in creative writing (where distinctiveness is rewarded), so the direction flips by domain Does preference tuning always reduce diversity the same way?. Adding a critique step inside the self-training loop actively counteracts 'tail narrowing' and preserves solution diversity across iterations — the authors argue this is a more fundamental win than the test-time accuracy bump Do critique models improve diversity during training itself?. Training order matters too: doing structured tasks before open-ended ones prevents entropy collapse from spilling over and damaging creative capability Does training order reshape how models handle different task types?.
The deeper reason all of this stays bounded: self-generated data, by definition, can't add capability the model didn't already have. Post-training *selects* from latent ability rather than creating it Do base models already contain hidden reasoning ability?, and self-improvement is formally capped by the generation–verification gap — every reliable gain needs something external to validate it What stops large language models from improving themselves?. So the honest answer is: self-generated data tends to reduce diversity by default, because it concentrates probability mass on what the model already prefers — but that's a tendency you can fight with critique loops, domain-aware rewards, and scheduling, not a law you're stuck with.
Sources 10 notes
SEAL demonstrates that models learn better from synthetic data they generate themselves than from data created by stronger external models. Self-generated data improved QA performance from 33.5% to 47.0%, suggesting that model-specific restructuring aligns with the learner's representational needs.
Teacher-refined data degrades performance when it exceeds the student's learning frontier, even if objectively higher quality. Students should filter refinements using their own statistical profile to retain only compatible improvements.
Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.
Post-trained language models exhibit a measurable shift where they recognize their outputs become their own future inputs, closing an action-perception loop absent in pretraining. Evidence includes 3-4x lower output entropy on-policy and behavioral signatures of trajectory recognition.
INFINITY-CHAT analyzed 70+ models across 26K open-ended queries and found an "Artificial Hivemind" effect: models independently generate strikingly similar or identical responses due to overlapping training data and alignment procedures, undermining the diversity benefits of model ensembles.
RLHF reduces lexical-syntactic diversity in code generation but increases it in creative writing. The direction depends on what each domain incentivizes: code rewards convergence toward correct solutions, while creative writing rewards stylistic distinctiveness.
Step-level critique in the training loop counteracts tail narrowing and maintains solution diversity across self-training iterations. This training-time benefit—preventing premature convergence—is more fundamental than test-time accuracy gains.
Omni-Thinker shows structured domains decrease output entropy while creative domains increase it. BWT-guided scheduling—training structured tasks first—yields 6.2% gains over joint training by preventing entropy collapse from damaging open-ended capabilities.
Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.
Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.