SYNTHESIS NOTE
Language, Text, and Discourse Psychology, Society, and Alignment

Can humans detect AI text if machines can measure it?

AI-generated text shows measurable differences from human writing across multiple linguistic dimensions, yet human judges consistently fail to identify it. Why does the gap between what is measurable and what is perceptible exist?

Synthesis note · 2026-02-21 · sourced from Discourses
Where exactly do LLMs break down with language structure? How should researchers navigate LLM reasoning research?

Post angle for Medium / LinkedIn

The AI detection debate assumes the problem is detecting something that looks human. The lexical diversity research reframes the problem: the differences are real and measurable, but they are the wrong kind for human perception to catch.

Six dimensions of lexical diversity — volume, abundance, variety-repetition, evenness, disparity, dispersion — all differ significantly between LLM-generated and human-written text. This is not a borderline finding; it holds under MANOVA across multiple ChatGPT versions. The differences are there.

But human judges — including applied linguists trained to analyze text — cannot reliably identify which samples are AI-generated. Multiple independent studies confirm this: poetry, academic abstracts, physics essays, narrative writing — across genres, humans fail to detect.

The twist in the newer data: more capable models (ChatGPT-4.5, o4-mini) diverge more from human lexical patterns than older models. The gap is widening, not closing. We might expect AI writing to converge on human-like text as models improve. Instead, the training objective (quality, helpfulness, coherence) appears to be pushing models toward an optimum that is distinctly non-human in its lexical patterns — and those patterns happen to be invisible to casual human inspection.

What's happening: human text detection relies on surface pattern recognition — it catches stylistic tells, tonal flatness, certain phrase patterns. What it does not catch is the statistical distribution of vocabulary across a document. That requires computational analysis. The same tools that would identify AI text are not available to a reader reading naturally.

A complementary finding from authorship representation learning confirms that these stylistic differences are separable from content. When content words are masked during training, authorship prediction models still learn discriminative features — suggesting that the stylistic patterns LLMs acquire are not mere content artifacts but genuine structural properties of text generation. Paraphrasing (preserving meaning while modifying expression) further confirms: style survives content transformation. This means the measurable non-humanness of LLM text is a property of how it writes, not what it writes about.

Implication: AI detection policy cannot be built on human judgment. It requires the same kind of distributional analysis that found these differences in the first place.

Inquiring lines that use this note as a source 37

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 114 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

llm text is measurably non-human but imperceptible to human judges