SYNTHESIS NOTE
Language, Text, and Discourse Psychology, Society, and Alignment

Can statistical rarity measure whether stories are truly original?

Can we operationalize originality as statistical rarity in narrative feature space? This matters because copyright law requires measuring human creative control, but rarity is relative, context-dependent, and doesn't guarantee quality or authorship.

Synthesis note · 2026-05-28 · sourced from Co Writing Collaboration

As AI seeps into writing, the question of what counts as original work shifts from how a story is written to how it is conceived. StoryScope proposes a concrete operationalization: represent each story as a vector of discourse-level narrative features and treat statistical rarity in that space as a proxy for originality. Less common combinations of narrative decisions reflect the broader notion of originality invoked by creativity research (Torrance) and by copyright law, which requires a minimal degree of originality and, per recent U.S. Copyright Office guidance, sufficient human creative control. The empirical hook: human stories are, on average, rarer in narrative feature space, while the five AI models occupy a tight, well-separated cluster.

This is appealing because it converts a contested legal-aesthetic concept into something measurable and model-agnostic. Rarity does not depend on surface style (which survives the humanization edit) and aligns with the intuition that originality is about making uncommon choices, not novel word combinations. It also gives the copyright question an operational handle: a work's position in narrative-decision space could index how much distinctive human conception it carries.

Why it stays a question: rarity-as-originality is a proxy with sharp limits. Rarity is defined relative to a reference distribution, so it drifts as both human and AI writing change — and rare is not the same as good or protectable; an incoherent story can be statistically rare. Conflating "uncommon in feature space" with "originally authored by a human" risks both false positives (idiosyncratic AI output) and false negatives (a human writing in a popular convention). The construct is a useful, falsifiable starting point for measuring conception rather than execution — but whether it should bear legal or evaluative weight is exactly what it leaves open.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 110 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

originality can be operationalized as statistical rarity in a feature space of narrative decisions