SYNTHESIS NOTE

Why do newer AI models diverge further from human writing patterns?

As language models improve, they seem to generate text that is measurably less human-like in lexical patterns, yet humans struggle to detect this difference. What drives this divergence, and what does it reveal about how models optimize for quality?

Synthesis note · 2026-02-21 · sourced from Discourses

The lexical diversity study compared ChatGPT-3.5, 4, o4-mini, and 4.5. The key finding: the newer models — o4-mini and 4.5 — differ most from human-written text on lexical diversity measures. They are the least human-like by measurable metric.

At the same time, human judges consistently fail to detect AI-generated text regardless of model version. More capable models don't become easier to detect; the failure of human judgment is stable across model generations.

ChatGPT-4.5 produces higher lexical diversity than older models despite generating fewer tokens — it is more lexically dense, but the density pattern is still non-human. The implication: newer models aren't converging on human-like writing by becoming better at mimicking human lexical patterns; they are becoming better at generating high-quality text that is nonetheless systematically different from human text.

This suggests that the training objective (RLHF, quality preference) is pushing models toward a different optimum than "human-like lexical diversity." The optimum models converge on is rated higher quality by human raters but is more measurably distinct from how humans naturally write.

The widening gap between measurable and perceptible has an important practical consequence: as models improve, naive human-based detection becomes less viable, not more. Detection requires moving to statistical/computational analysis that humans don't spontaneously perform.

Inquiring lines that use this note as a source 17

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 93 in 2-hop network ·medium cluster Open in graph ↗

Why do newer AI models diverge further from huma… Can human judges detect measurable differences in … Can humans detect AI text if machines can measure …

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

newer llm generations diverge further from human lexical patterns while becoming harder to detect

Why do newer AI models diverge further from human writing patterns?

Related concepts in this collection 2

Related papers in this collection 8

Search by related questions 4