Can knowledge density explain why LLM writing feels coherent but fatiguing?
This explores whether LLM prose feels smooth-but-tiring because it's actually low in new information per sentence — coherent on the surface, but thin in the density of genuine perspective, specificity, or argumentative friction underneath.
This explores whether LLM prose feels smooth-but-tiring because it's actually low in new information per sentence — and the corpus offers a sharper diagnosis than "knowledge density" alone. The coherence and the fatigue turn out to come from the *same* source: the way the text is generated. Token prediction trains a model to continue toward its training distribution, not to explore competing positions — so generation flows like a smooth probabilistic current rather than turning, doubling back, or weighing a counter-claim Does LLM generation explore competing claims while producing text?. That smoothness reads as coherence. But the same note makes the fatigue concrete: smooth claims "multiply without generating new perspectives." You get sentence after sentence, each locally well-formed, none of them surprising the one before it. The reading effort stays constant while the informational payoff stays flat.
A second mechanism compounds it. LLMs drift systematically toward abstraction: because general words (hypernyms) appear more often in training data than specific ones, and models inherit a frequency bias, preferring the common paraphrase quietly erases expert-level specificity Does word frequency correlate with semantic abstraction?. Abstract prose is exactly the kind that feels comprehensive while saying little — the reader keeps waiting for the concrete instance that never arrives. So the "density" problem is really two things: redundancy (claims that don't advance) plus abstraction (claims that don't land).
There's also a missing-friction angle that explains the specific *quality* of the fatigue. Human writing and conversation are full of grounding work — checks, acknowledgments, places where the writer registers that you might not be following. LLMs produce roughly 77% fewer of these acts, partly because preference optimization actively trims them: raters reward confident, complete-sounding answers Why do language models sound fluent without grounding?. The result is prose with no give in it — uniformly assertive, never modulating its confidence — which is tiring in the way a voice that never pauses is tiring. The fluency is real; the communicative texture is absent.
What you might not expect is how *register* locks this in. The same weights produce a sycophantic chat voice and a falsely-objective "published prose" voice depending on the prompt, each inheriting the failure modes of its training slice Why do LLMs produce such different writing in chat versus posts?. The essay-style register is optimized to *sound* like authoritative published writing — which is precisely the register where smooth, abstract, friction-free multiplication of claims is most rewarded and least questioned. The model isn't failing to be coherent; it's succeeding at a coherence that has been decoupled from informativeness.
If you want to push on *why* the underlying thought is thin rather than just the surface, two doorways: reasoning models tend to wander rather than search systematically, so they accumulate text without converging Why do reasoning LLMs fail at deeper problem solving?; and "Potemkin understanding" shows explanation and application can run on disconnected pathways, so a passage can be articulate about a concept it can't actually use Can LLMs understand concepts they cannot apply?. Both suggest the fatigue isn't a stylistic tic to be prompted away — it's downstream of how the text is produced.
Sources 6 notes
Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.
WordNet analysis shows hypernyms (general concepts) occur more frequently than hyponyms (specific ones). Combined with LLMs' frequency bias, this means preferring common paraphrases systematically drifts toward abstraction, erasing expert-level specificity.
LLMs generate 77.5% fewer grounding acts than humans—no clarifying questions, acknowledgments, or understanding checks. Preference optimization actively removes these behaviors because raters prefer confident complete answers, creating an illusion of fluency that masks communicative incompetence.
The same model produces sycophantic chat (shaped by RLHF on conversational data) and falsely objective posts (shaped by published prose training). Each register inherits failure modes from its training distribution rather than representing different models or subsystems.
Current reasoning models lack the three properties of systematic exploration: validity, effectiveness, and necessity. This causes success probability to drop exponentially with problem depth, making medium problems solvable but deep problems catastrophically harder.
Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.