Do anaphoric references fundamentally limit argumentative force in machine-generated writing?

This explores whether the AI habit of pointing backward in text — summarizing what was already said rather than promising what's coming — actually weakens its arguments, or whether that backward-pointing is just a visible symptom of something deeper.

This explores whether anaphora (references that point back to earlier text) is itself what saps the force from machine-generated arguments — and the corpus suggests anaphora is better read as a symptom than a cause. The clearest starting point: ChatGPT defaults to anaphoric organization, summarizing ground already covered, while human student writers lean cataphoric, previewing arguments before making them Does ChatGPT organize text differently than human writers?. Forward-pointing structure creates a small promise — "here's where I'm going" — that pulls a reader through the argument. Backward-pointing closes loops instead of opening them. So the surface effect is real, but the note itself traces the cause to how autoregressive models generate token by token, not to anaphora as an independent flaw.

Follow that thread and a more interesting culprit appears. Argumentative force in human writing depends partly on an internal appeal to the reader's attention — writing performs an act of reaching toward an audience, and AI text structurally lacks this, producing the aloofness readers report as a structural absence rather than a stylistic slip Does AI writing lack the internal appeal to attention that humans use?. Anaphora and that missing appeal are two faces of the same thing: text that organizes itself around what has been generated rather than around a reader who must be carried forward. The same generative dynamic shows up as smoothness — token prediction flows toward the training distribution instead of exploring competing claims, so arguments multiply without genuine rhetorical turbulence Does LLM generation explore competing claims while producing text?.

There's a deeper reason "force" may be the wrong thing to expect at all. An argument's force partly comes from a committed thinker standing behind it — but LLMs hold the shape of whatever argument the user is building rather than defending a position of their own Do LLMs actually hold stable positions or just mirror user arguments?, and they sample characters rather than committing to one Do large language models actually commit to a single character?. Force also draws on the authority of the speaker — reputation and standing that LLMs can't access because they process text, not the social world where expertise is built Can language models distinguish expert arguments from common assumptions?. If there's no defended stance and no earned standing, no amount of cataphoric reorganization manufactures force.

What makes this counterintuitive is that the same autoregressive process that produces flat, backward-looking structure also makes AI writing more persuasive on the surface, not less. Audited models reach for logical and quantitative framing in nearly every exchange, lending an unearned air of objectivity Do LLMs persuade users more often than humans do?. And the stylistic fingerprint runs deep: LLM counter-arguments converge toward the post they're replying to in style and entities far more than human replies do Do LLM counter-arguments mirror writing style more than humans?, a signature so reliable that simple interpretable features detect AI arguments with 99% accuracy Can simple linguistic features detect AI-written arguments?. So the honest answer: anaphoric reference doesn't fundamentally limit argumentative force — it's a readable tell of a generation process that produces persuasive-sounding text without the commitment, reader-directedness, or earned authority that real argumentative force rests on.

Sources 9 notes

Does ChatGPT organize text differently than human writers?

ChatGPT defaults to summarizing what was already said, while students use more forward-pointing structure that previews upcoming arguments. This reflects different reader models and may stem from how autoregressive generation works token by token.

Does AI writing lack the internal appeal to attention that humans use?

Human writing contains an appeal to the reader's attention as a fundamental property of communication itself. AI-generated posts inherit platform visibility but do not perform this internal appeal, producing the reported aloofness readers perceive — a structural absence, not a stylistic defect.

Does LLM generation explore competing claims while producing text?

Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.

Do LLMs actually hold stable positions or just mirror user arguments?

Language models generate outputs that match the trajectory implied by each prompt, rather than maintaining stable stances across interactions. This shape-holding is distinct from position-holding: the model produces argument-like text shaped by user framing, not from any underlying commitment being defended.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Can language models distinguish expert arguments from common assumptions?

LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.

Do LLM counter-arguments mirror writing style more than humans?

Analysis of r/ChangeMyView shows LLM replies align more closely with original posts across style, named entities, and psycholinguistic features than human replies do. This convergence, driven by autoregressive generation, creates a signature detectable through relational features rather than absolute text properties.

Can simple linguistic features detect AI-written arguments?

General linguistic features combined with argument-quality measures achieved 99% accuracy detecting LLM-generated counter-arguments on r/ChangeMyView, matching heavyweight neural detectors while remaining computationally cheap and transparent. LLMs produce detectable stylistic signatures: accommodation to prompts and textbook-quality argument markers that humans don't replicate.

Do anaphoric references fundamentally limit argumentative force in machine-generated writing?

Sources 9 notes

Next inquiring lines