How should authorship and originality law attach to discourse structure versus surface style?

This explores where the law should locate authorship and originality — in the deep architecture of how a text is organized (discourse structure) or in its sentence-level voice and word choice (surface style) — and what the corpus suggests about which layer actually carries the human signature.

This explores where authorship and originality should attach: the deep architecture of how a text organizes meaning, or its surface voice and word choice. The corpus tilts hard toward the structural layer — and that's the surprising part, because copyright intuition usually polices surface copying (paraphrase, the turn of a phrase) while treating organization as too abstract to own.

The sharpest evidence comes from work that deliberately strips style away. A detector called StoryScope separated AI fiction from human fiction with 93% accuracy using *only* discourse-level features — character agency, chronological structure — keeping 97% of its performance after eliminating stylistic cues entirely Can AI stories be detected without analyzing writing style?. The reason matters for law: surface style is cheap to mimic and cheap to 'humanize,' but structural choices resist editing because changing them means rewriting, not retouching. If the human fingerprint survives style removal, then attaching originality to surface style protects the layer that's easiest to launder.

That same line of work also offers a way to *operationalize* originality the law has always struggled to define. Instead of treating originality as an unmeasurable spark, it can be cast as statistical rarity in a feature space of discourse-level narrative decisions: human stories occupy measurably rarer regions, while AI outputs cluster tightly together Can statistical rarity measure whether stories are truly original?. This gives a quantifiable proxy for exactly the 'human conception' copyright doctrine reaches for but can't pin down — and it lives in structure, not phrasing.

But the corpus also warns against collapsing authorship into structure alone, because what makes human discourse *authored* isn't only its shape. Several notes argue that AI text structurally lacks properties that have nothing to do with style: an embodied, situated author who actually had the experience, a genuine appeal to a reader's attention, and political situatedness Does AI-generated text lose core properties of human writing? Does AI writing lack the internal appeal to attention that humans use?. There's even a measurable organizational tell — ChatGPT defaults to looking backward (anaphoric, summarizing what was said) while human writers point forward (cataphoric, previewing arguments), possibly a side effect of token-by-token generation Does ChatGPT organize text differently than human writers?. So 'discourse structure' itself splits into two things the law would treat differently: organizational patterns (mimicable in principle) and the situated authorial stance behind them (not).

The deeper caution is that authorship may not be a property of the text at all. One strand holds that the force of an argument depends on the standing of the thinker — reputation, track record, the social world where expertise is built — which a model can't carry because it processes text, not the social context that authorizes it Can language models distinguish expert arguments from common assumptions?. If that's right, originality law that attaches to *any* textual feature, structural or stylistic, is measuring a shadow of authorship rather than the thing itself — and the discourse-level metrics are valuable precisely as the best available proxy, not as the genuine article.

Sources 6 notes

Can AI stories be detected without analyzing writing style?

StoryScope achieved 93.2% accuracy separating AI from human fiction using only discourse-level features like character agency and chronological structure, retaining 97% of performance while eliminating stylistic cues. These structural choices resist humanization because they require rewrites, not surface edits.

Can statistical rarity measure whether stories are truly original?

StoryScope operationalizes originality as statistical rarity in discourse-level narrative decisions. Human stories are measurably rarer in this space than AI outputs, which cluster tightly, offering a quantifiable proxy for the human conception copyright law requires.

Does AI-generated text lose core properties of human writing?

Research shows artificial text disrupts dialogic symmetry, context continuity, embodied authorship, and political situatedness. These are not surface flaws but structural absences—AI hotel reviews show 80%+ detection accuracy due to inherent falsity about personal experience distinct from human deception.

Does AI writing lack the internal appeal to attention that humans use?

Human writing contains an appeal to the reader's attention as a fundamental property of communication itself. AI-generated posts inherit platform visibility but do not perform this internal appeal, producing the reported aloofness readers perceive — a structural absence, not a stylistic defect.

Does ChatGPT organize text differently than human writers?

ChatGPT defaults to summarizing what was already said, while students use more forward-pointing structure that previews upcoming arguments. This reflects different reader models and may stem from how autoregressive generation works token by token.

Can language models distinguish expert arguments from common assumptions?

LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a legal theorist and AI researcher evaluating whether authorship and originality should attach to discourse structure (narrative organization, argument flow, situated stance) or surface style (word choice, voice, phrasing). The question remains open—law has not settled it, and LLM capabilities evolve faster than doctrine.

What a curated library found — and when (findings span 2024–2026, DATED CLAIMS):
• StoryScope (2026) achieved 93% accuracy distinguishing human from AI fiction using *only* discourse-level features (character agency, chronology), retaining 97% performance after stripping all stylistic cues—suggesting the human fingerprint lives in structure, not surface.
• Originality can be operationalized as statistical rarity in discourse-feature space: human narratives occupy measurably rarer regions than tightly clustered AI outputs (2026), offering a quantifiable proxy for 'human conception' copyright doctrine invokes but cannot define.
• ChatGPT favors anaphoric (backward-looking, summarizing) text organization while human writers prefer cataphoric (forward-previewing); this organizational tell may reflect token-by-token generation, not authorial choice (2026).
• Human discourse exhibits situated authorial stance, embodied experience, and genuine appeal to reader attention—properties AI text structurally *lacks*, independent of stylistic surface (2024–2025).
• The force of argument depends on the thinker's standing (reputation, social context, expertise) which models cannot carry; textual features measure only a proxy for authorship, not authorship itself (2024).

Anchor papers (verify; mind their dates):
• arXiv:2604.03136 (StoryScope, 2026)
• arXiv:2604.22503 (Persona Distortions, 2026)
• arXiv:2407.08790 (Linguistic Agency, 2024)
• arXiv:2511.08394 (Interaction Dynamics, 2025)

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above—especially the 93% discourse-level detection rate and the anaphoric/cataphoric split—judge whether newer training regimes, chain-of-thought prompting, or fine-tuning for authorial stance have since *erased* these tells or driven them deeper. Separate the durable question ('what makes a text authored?') from perishable limitations ('current models cluster on anaphora'). Cite what resolved each constraint, and say plainly where it persists.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months: has any 2026-Q3+ paper shown that discourse-level features no longer distinguish human from AI, or that situated stance can be learned/faked?
(3) Propose 2 research questions that ASSUME the regime may have shifted: (a) If discourse-level tells do persist, can law use statistical-rarity thresholds as a proxy for originality without collapsing authorship into mere textual geometry? (b) If social authority (not text) is where authorship lives, what role should discourse structure play in originality doctrine at all?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

How should authorship and originality law attach to discourse structure versus surface style?

Sources 6 notes

Next inquiring lines