How should authorship and originality law attach to discourse structure versus surface style?
This explores where the law should locate authorship and originality — in the deep architecture of how a text is organized (discourse structure) or in its sentence-level voice and word choice (surface style) — and what the corpus suggests about which layer actually carries the human signature.
This explores where authorship and originality should attach: the deep architecture of how a text organizes meaning, or its surface voice and word choice. The corpus tilts hard toward the structural layer — and that's the surprising part, because copyright intuition usually polices surface copying (paraphrase, the turn of a phrase) while treating organization as too abstract to own.
The sharpest evidence comes from work that deliberately strips style away. A detector called StoryScope separated AI fiction from human fiction with 93% accuracy using *only* discourse-level features — character agency, chronological structure — keeping 97% of its performance after eliminating stylistic cues entirely Can AI stories be detected without analyzing writing style?. The reason matters for law: surface style is cheap to mimic and cheap to 'humanize,' but structural choices resist editing because changing them means rewriting, not retouching. If the human fingerprint survives style removal, then attaching originality to surface style protects the layer that's easiest to launder.
That same line of work also offers a way to *operationalize* originality the law has always struggled to define. Instead of treating originality as an unmeasurable spark, it can be cast as statistical rarity in a feature space of discourse-level narrative decisions: human stories occupy measurably rarer regions, while AI outputs cluster tightly together Can statistical rarity measure whether stories are truly original?. This gives a quantifiable proxy for exactly the 'human conception' copyright doctrine reaches for but can't pin down — and it lives in structure, not phrasing.
But the corpus also warns against collapsing authorship into structure alone, because what makes human discourse *authored* isn't only its shape. Several notes argue that AI text structurally lacks properties that have nothing to do with style: an embodied, situated author who actually had the experience, a genuine appeal to a reader's attention, and political situatedness Does AI-generated text lose core properties of human writing? Does AI writing lack the internal appeal to attention that humans use?. There's even a measurable organizational tell — ChatGPT defaults to looking backward (anaphoric, summarizing what was said) while human writers point forward (cataphoric, previewing arguments), possibly a side effect of token-by-token generation Does ChatGPT organize text differently than human writers?. So 'discourse structure' itself splits into two things the law would treat differently: organizational patterns (mimicable in principle) and the situated authorial stance behind them (not).
The deeper caution is that authorship may not be a property of the text at all. One strand holds that the force of an argument depends on the standing of the thinker — reputation, track record, the social world where expertise is built — which a model can't carry because it processes text, not the social context that authorizes it Can language models distinguish expert arguments from common assumptions?. If that's right, originality law that attaches to *any* textual feature, structural or stylistic, is measuring a shadow of authorship rather than the thing itself — and the discourse-level metrics are valuable precisely as the best available proxy, not as the genuine article.
Sources 6 notes
StoryScope achieved 93.2% accuracy separating AI from human fiction using only discourse-level features like character agency and chronological structure, retaining 97% of performance while eliminating stylistic cues. These structural choices resist humanization because they require rewrites, not surface edits.
StoryScope operationalizes originality as statistical rarity in discourse-level narrative decisions. Human stories are measurably rarer in this space than AI outputs, which cluster tightly, offering a quantifiable proxy for the human conception copyright law requires.
Research shows artificial text disrupts dialogic symmetry, context continuity, embodied authorship, and political situatedness. These are not surface flaws but structural absences—AI hotel reviews show 80%+ detection accuracy due to inherent falsity about personal experience distinct from human deception.
Human writing contains an appeal to the reader's attention as a fundamental property of communication itself. AI-generated posts inherit platform visibility but do not perform this internal appeal, producing the reported aloofness readers perceive — a structural absence, not a stylistic defect.
ChatGPT defaults to summarizing what was already said, while students use more forward-pointing structure that previews upcoming arguments. This reflects different reader models and may stem from how autoregressive generation works token by token.
LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.