What makes creative writing diversity different from code diversity fundamentally?
This explores whether diversity means the same thing in open-ended creative writing as it does in code (and code-like reasoning) — and the corpus suggests it doesn't: in one, diversity is the destination; in the other, it's a route to a single correct answer.
This reads the question as asking whether "diversity" is the same property in two very different output spaces — creative writing, where there is no right answer, and code or math, where there usually is. The corpus has no note that says "code diversity" outright, but it repeatedly studies diversity in both creative and verifiable tasks, and the contrast that emerges is sharp. In code and math, multiple distinct solutions are valuable because they're different paths to one correct, checkable endpoint — diversity is instrumental, a way to explore the solution space and avoid getting stuck. In creative writing, divergence *is* the endpoint: there's no convergence target, and the rarer the output, the better. Can diversity optimization improve quality during language model training? makes this concrete — DARLING rewards semantic diversity during training and improves results on *both* creative and mathematical tasks, but for opposite reasons: in math, diversity catalyzes exploration toward the verifiable answer; in creative work, it's the quality being measured.
The deepest clue to the fundamental difference comes from how originality even gets defined. Can statistical rarity measure whether stories are truly original? operationalizes creative quality as *statistical rarity* — human stories occupy rarer regions of narrative feature space, while AI outputs cluster tightly. That's a definition that would make no sense for code: you don't want your sorting function to be statistically rare, you want it correct. So creative diversity is judged against the whole population of possible outputs (how unusual is this?), whereas code diversity is judged against a fixed criterion (does this pass?). The reference frame is different at the root.
This is also why AI struggles more visibly with the creative kind. Do different AI models actually produce diverse outputs? documents an "Artificial Hivemind" — 70+ models independently produce near-identical responses to open-ended prompts because they share training data and alignment. Does AI generate diverse claims or diverse perspectives? sharpens it: models scale the *volume* of claims without scaling the *perspectives* behind them — a thousand AI articles can amount to roughly one viewpoint. And Does AI writing make all writers sound the same? shows the homogenizing pressure even acts on humans, narrowing writers toward one confident register. For code, this convergence is mostly harmless or even good — one canonical idiom is fine. For writing, convergence is the failure.
There's a structural reason the models behave this way, and it cuts across both domains. Why aren't bigger models better for generating diverse outputs? finds that bigger models concentrate probability mass on preferred outputs, generating *fewer* unique samples — alignment and scale both push toward the high-probability center. That same centering helps code (you want the likely-correct idiom) and hurts creativity (you want the tail). So the "fundamental difference" is partly a mismatch between what the architecture is optimized for and what each task rewards.
The most surprising thread is that creative diversity may not be a single thing at all. Can LLMs reason creatively beyond conventional problem-solving? argues creativity splits into combinational, exploratory, and transformational modes — paradigms that current reasoning methods, tuned for convergent problem-solving, simply don't address, which may be *why* ideation collapses to sameness. Meanwhile, on the code-and-math side, diversity has a cleaner mechanical handle: Do critique models improve diversity during training itself? shows step-level critique preserves solution diversity and prevents premature convergence during self-training. The takeaway a curious reader might not expect: code diversity is a tractable exploration problem with a known fix (keep the search from narrowing), while creative diversity is a definitional and possibly multi-dimensional one — we're still arguing over what we're even measuring.
Sources 8 notes
DARLING jointly optimizes for quality and semantic diversity using a learned classifier, finding that diversity rewards catalyze exploration and produce higher-quality outputs than quality-only baselines across both creative and mathematical tasks.
StoryScope operationalizes originality as statistical rarity in discourse-level narrative decisions. Human stories are measurably rarer in this space than AI outputs, which cluster tightly, offering a quantifiable proxy for the human conception copyright law requires.
INFINITY-CHAT analyzed 70+ models across 26K open-ended queries and found an "Artificial Hivemind" effect: models independently generate strikingly similar or identical responses due to overlapping training data and alignment procedures, undermining the diversity benefits of model ensembles.
Large language models generate numerous well-formed claims by following probabilistic patterns in training data, not by exploring competing argumentative positions. This produces volume without perspectival diversity—a thousand AI articles often represent approximately one viewpoint.
AI-assisted text shows significantly reduced variation in perceived author traits across 22 of 29 dimensions, with writers converging on more confident, positive, and articulate personas. This second-order homogenization erodes readers' ability to distinguish among writers by their distinct voices.
Research shows that for synthetic data generation, models around 500M parameters outperform larger ones in output diversity per sample. Larger models concentrate probability mass on preferred outputs, reducing the variety of distinct samples generated within a fixed budget.
Research identifies combinational, exploratory, and transformational reasoning as distinct creative modes grounded in cognitive science. Existing LLM reasoning methods address only conventional problem-solving, leaving creative paradigms unaddressed and potentially explaining diversity collapse in ideation.
Step-level critique in the training loop counteracts tail narrowing and maintains solution diversity across self-training iterations. This training-time benefit—preventing premature convergence—is more fundamental than test-time accuracy gains.