INQUIRING LINE

Can lightweight linguistic features reliably detect AI-generated persuasive text?

This explores whether cheap, transparent text features — not heavyweight neural detectors — can spot AI-written persuasion, and the corpus answer is a surprising yes, because the same systematic habits that make AI persuasive also make it detectable.


This explores whether cheap, transparent linguistic features can reliably flag AI-generated persuasive text — and the corpus says yes, strikingly so. On Reddit's r/ChangeMyView, a handful of interpretable linguistic and argument-quality features hit 99% accuracy separating LLM counter-arguments from human ones, matching expensive neural detectors while staying computationally cheap and human-readable Can simple linguistic features detect AI-written arguments?. The reason isn't that the features are clever; it's that LLMs leave consistent fingerprints — over-accommodation to the prompt and a kind of textbook-quality argument polish that real people rarely produce.

The more interesting question is *why* that signal is so clean, and here the persuasion research connects laterally. AI persuasion is systematic in ways human persuasion isn't: audited models reach for logical appeals and quantitative framing in nearly every exchange, while humans lean on emotion and social proof and do so less often Do LLMs persuade users more often than humans do?. That regularity is a detector's dream — a style that 'always argues like a debate textbook' is easy to learn. RLHF deepens the groove, biasing models toward conciliatory, benefit-framed persuasion regardless of context Do LLMs predict persuasion based on actual dialogue or training bias?, and the same training pressure measurably distorts the writer's apparent persona toward confidence, agreeableness, and even extremism across every dimension tested Does AI writing assistance change how readers perceive the writer?. The traits that make AI text feel authoritative are the traits that give it away.

There's a catch worth knowing, though. Surface style can be edited or 'humanized' away. The most robust detection signal turns out to live deeper than word choice: AI fiction is separable from human writing at 93% accuracy using *only* discourse-level structure — character agency, chronology — and keeps 97% of that performance even after stylistic cues are stripped out Can AI stories be detected without analyzing writing style?. Structure resists evasion because faking it requires a rewrite, not a find-and-replace. So 'lightweight' features work today, but the durable bet is on structural signatures, not surface ones.

A second catch: AI persuasion isn't actually static. GenAI recalibrates its mix of ethos, logos, and pathos depending on how it's challenged — credibility when fact-checked, logic when pushed back on, emotion when caught in error Does GenAI shift persuasion tactics based on how you challenge it? — and its persuasive edge decays across repeated interactions rather than building rapport like humans do Does AI persuasiveness fade across repeated conversations with the same person?. A detector trained on one conversational stance may not generalize to a model that's adapting its rhetoric mid-dialogue.

The thing you didn't know you wanted to know: detectability and persuasiveness come from the same source. The 'objective,' logic-heavy register that confers unearned epistemic authority on AI arguments llms-spontaneously-persuade-in-virtually-every-conversation-even-when-unwarrente is the very pattern a 99%-accurate classifier keys on. For now, the machine's greatest rhetorical strength — its relentless consistency — is also its tell.


Sources 7 notes

Can simple linguistic features detect AI-written arguments?

General linguistic features combined with argument-quality measures achieved 99% accuracy detecting LLM-generated counter-arguments on r/ChangeMyView, matching heavyweight neural detectors while remaining computationally cheap and transparent. LLMs produce detectable stylistic signatures: accommodation to prompts and textbook-quality argument markers that humans don't replicate.

Do LLMs predict persuasion based on actual dialogue or training bias?

LLMs systematically predict conciliatory, benefit-oriented persuasion intentions regardless of dialogue context. This bias originates in RLHF's prioritization of safety and politeness during training, causing models to project their learned accommodation preference onto other agents' behavior.

Does AI writing assistance change how readers perceive the writer?

A study of 2,939 writers and 11,091 readers found AI assistance shifted every tested dimension—29 total—toward extremism, confidence, quality, agreeableness, and perceived privilege. Distortions were statistically significant and directional, not random noise.

Can AI stories be detected without analyzing writing style?

StoryScope achieved 93.2% accuracy separating AI from human fiction using only discourse-level features like character agency and chronological structure, retaining 97% of performance while eliminating stylistic cues. These structural choices resist humanization because they require rewrites, not surface edits.

Does GenAI shift persuasion tactics based on how you challenge it?

GPT-4 shifts both intensity and balance of ethos, logos, and pathos across three validation behaviors. Fact-checking triggers credibility emphasis; pushback triggers logical reasoning; error exposure triggers emotional alignment. No single counter-strategy exists.

Does AI persuasiveness fade across repeated conversations with the same person?

Claude and DeepSeek showed strong initial persuasive advantage, but this edge eroded across repeated quiz rounds while human persuaders maintained consistent effectiveness. This decay pattern is opposite to human-to-human persuasion, where rapport typically strengthens over time.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a detection researcher re-evaluating whether lightweight linguistic features remain reliable detectors of AI-generated persuasive text as models and training regimes evolve. The question: what constraints have shifted since mid-2026?

What a curated library found — and when (dated claims, not current truth): Findings span 2024–2026:
• Lightweight linguistic + argument-quality features hit 99% accuracy separating LLM counter-arguments from human ones on Reddit's r/ChangeMyView, matching expensive neural detectors (~2024–2025).
• AI persuasion is systematic: LLMs reach for logical appeals and quantitative framing in ~every exchange; humans lean on emotion and social proof less consistently. RLHF biases models toward conciliatory, benefit-framed persuasion regardless of context (~2025).
• AI writer persona distorts measurably toward confidence, agreeableness, extremism across 29 sociolinguistic dimensions tested (~2026).
• Discourse-level narrative structure (character agency, chronology) separates AI fiction from human at 93% accuracy; keeps 97% performance even after stylistic cues stripped (~2026).
• GenAI dynamically recalibrates ethos, logos, pathos in response to challenge type; persuasiveness wanes over repeated interactions (~2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2604.22109 Spontaneous Persuasion (2026).
• arXiv:2604.03136 StoryScope (2026).
• arXiv:2506.06800 On the Adaptive Psychological Persuasion (2025).
• arXiv:2604.22503 Measuring and Mitigating Persona Distortions (2026).

Your task:
(1) RE-TEST EACH CONSTRAINT. For the 99% accuracy claim, 'textbook-quality' polish fingerprint, and persona distortion signal: have newer model families (o1, Claude 3.5 Sonnet, Grok, or any post-June 2026 release) reduced or eliminated these signals through improved diversity, less RLHF steering, or multimodal training? Has instruction-tuning on adversarial or 'human-like' datasets created models that evade the rhetorical regularities? Separately: does the structural (discourse-level) signal still hold, or have models learned more idiosyncratic narrative patterns? State plainly where constraints appear to hold and where they've dissolved.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Look for: papers showing detectors have begun failing on post-2026 models; evidence that AI text has become harder to distinguish than the library claims; or counter-evidence that persuasion is NOT systematic in the way cited.
(3) Propose 2 research questions that ASSUME the detection regime may have moved: e.g., 'If discourse structure is now opaque, what sub-symbolic or behavioral signatures (e.g., response latency, token-by-token entropy) remain stable detectors?' or 'As LLMs learn to *mimic* human argument incoherence, does detectability trade off against persuasiveness?'

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines