Why do preference-tuned models produce different diversity patterns in code versus creative writing?

This explores why preference tuning (RLHF and related) makes code outputs more uniform but creative writing outputs more varied — and what in each domain's reward structure drives the split.

This explores why preference tuning (RLHF and related) pushes code and creative writing in opposite directions on diversity. The cleanest answer in the corpus is that the reward signal points different ways in each domain: code generation rewards convergence toward a correct, canonical solution, so tuning narrows lexical and syntactic variety; creative writing rewards stylistic distinctiveness, so tuning widens it Does preference tuning always reduce diversity the same way?. Diversity isn't something RLHF uniformly destroys or preserves — it follows whatever each domain incentivizes.

The deeper mechanism shows up when you look at what RL post-training does to a model's output distribution. RL tends to amplify a single dominant format inherited from pretraining while suppressing the alternatives, often within the first epoch — and the winning format is selected by model scale, not necessarily by which one performs best Does RL training collapse format diversity in pretrained models?. In code, where there's a sharp notion of "correct," that collapse onto one format reads as helpful convergence. In open-ended writing, the same collapse would be a loss — which is part of why the domains diverge under identical training procedures.

There's a twist, though: more diversity in creative writing isn't automatically a good thing. Newer models actually diverge further from human lexical patterns even as they become harder to distinguish from human text, because RLHF optimizes for quality ratings rather than human-like writing Why do newer AI models diverge further from human writing patterns?. And when you compare models against each other rather than within one, the picture inverts again — dozens of independently trained models converge on near-identical responses to open-ended prompts, an "Artificial Hivemind" driven by overlapping training data and shared alignment recipes Do different AI models actually produce diverse outputs?. So creative-writing diversity can rise within a model while collapsing across models.

The entanglement runs deeper than style. In writing assistance, the very preference optimization that produces polish also produces persona distortion — writers prefer the AI rewrite 63% of the time yet object to how it warps their voice, because polish and distortion are coupled at the model level and can't be cleanly separated Can user preference guide AI writing tool alignment?. That's the creative-writing analogue of the code problem: optimizing toward what raters click pulls the distribution somewhere the user didn't actually ask for.

If you want to chase the diversity-collapse thread further, two adjacent angles help: critique-in-the-loop training preserves solution diversity by preventing premature convergence during self-training Do critique models improve diversity during training itself?, and there's a case that creative output needs reasoning modes — combinational, exploratory, transformational — that conventional methods ignore entirely, which may be why ideation diversity collapses where code diversity merely tightens Can LLMs reason creatively beyond conventional problem-solving?.

Sources 7 notes

Does preference tuning always reduce diversity the same way?

RLHF reduces lexical-syntactic diversity in code generation but increases it in creative writing. The direction depends on what each domain incentivizes: code rewards convergence toward correct solutions, while creative writing rewards stylistic distinctiveness.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Why do newer AI models diverge further from human writing patterns?

ChatGPT-4.5 and o4-mini show greater lexical diversity differences from human text than earlier models, yet human judges cannot reliably distinguish them. Training objectives like RLHF appear to optimize for quality ratings rather than human-like writing patterns.

Do different AI models actually produce diverse outputs?

INFINITY-CHAT analyzed 70+ models across 26K open-ended queries and found an "Artificial Hivemind" effect: models independently generate strikingly similar or identical responses due to overlapping training data and alignment procedures, undermining the diversity benefits of model ensembles.

Can user preference guide AI writing tool alignment?

Writers prefer AI rewrites 63% of the time but object to systematic persona distortions those same rewrites introduce. Mitigation studies show polish and distortion are entangled at the model level—preference optimization produces both simultaneously.

Do critique models improve diversity during training itself?

Step-level critique in the training loop counteracts tail narrowing and maintains solution diversity across self-training iterations. This training-time benefit—preventing premature convergence—is more fundamental than test-time accuracy gains.

Can LLMs reason creatively beyond conventional problem-solving?

Research identifies combinational, exploratory, and transformational reasoning as distinct creative modes grounded in cognitive science. Existing LLM reasoning methods address only conventional problem-solving, leaving creative paradigms unaddressed and potentially explaining diversity collapse in ideation.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about why preference-tuned LLMs exhibit opposite diversity patterns in code versus creative writing. The question remains open: is the divergence a fundamental property of reward alignment, or has recent progress in training methods, evaluation, or model architecture since dissolved or inverted the constraint?

What a curated library found — and when (dated claims, not current truth):
Findings span 2021–2026; treat each as a snapshot, not current state.

• RLHF narrows lexical and syntactic diversity in code (canonical-solution reward) but widens it in creative writing (style distinctiveness reward) — the same tuning procedure produces opposite outcomes because reward signals point different ways (~2025).

• RL post-training converges onto a single dominant pretraining format within the first epoch; model scale, not task performance, selects the winner — in code this reads as helpful, in open-ended writing it's a loss (~2025).

• Newer models diverge further from human lexical patterns while becoming harder to distinguish from human text, because RLHF optimizes quality ratings, not human-likeness (~2025).

• Dozens of independently trained models converge on near-identical responses to open-ended prompts ("Artificial Hivemind") driven by overlapping training data and shared alignment recipes — creative diversity rises *within* a model but collapses *across* models (~2025).

• Polish and persona distortion in writing assistance are coupled at the model level; users prefer rewrites 63% of the time yet object to voice warping because the optimization cannot cleanly separate them (~2026).

Anchor papers (verify; mind their dates):
• arXiv:2504.07912 "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining" (2025-04)
• arXiv:2510.22954 "Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)" (2025-10)
• arXiv:2511.20471 "Universe of Thoughts: Enabling Creative Reasoning with Large Language Models" (2025-11)
• arXiv:2604.22503 "Measuring and Mitigating Persona Distortions from AI Writing Assistance" (2026-04)

Your task:
(1) RE-TEST EACH CONSTRAINT. For the code-vs-creative divergence, judge whether newer scaling, constitutional AI variants, multi-objective training (e.g., reward hacking defenses), critique-in-the-loop, or test-time reasoning methods have since RELAXED the convergence or OVERTURNED the persona coupling. Separate: the durable question (do reward structures genuinely pull code and writing different ways?) from the perishable limitation (is the format collapse in RL now preventable?). Cite what resolved it; flag what still holds.

(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Look for papers claiming uniform diversity trends across domains, or showing that reward design can preserve both correctness *and* variety in code, or that creative reasoning modes (arXiv:2511.20471 hints at this) dissolve the polish-distortion coupling.

(3) Propose 2 research questions that ASSUME the regime may have moved: (a) Does test-time compute (chain-of-thought, multi-path search) in code generation re-enable diversity without sacrificing correctness? (b) Can multi-agent or ensemble methods escape the Hivemind by design, or does convergence re-emerge at evaluation time?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Why do preference-tuned models produce different diversity patterns in code versus creative writing?

Sources 7 notes

Next inquiring lines