Does focusing on one strong linguistic cue outperform using multiple features for detection?
This explores a quality-vs-quantity tension in detection: whether one strong, well-chosen signal beats stacking many features — drawing on what the corpus says about cue sufficiency, feature combination, and why some signals carry more weight than others.
This explores a quality-vs-quantity tension in detection: does one strong signal beat a pile of features? The corpus answers from two directions, and the interesting part is that they don't fully agree — which tells you the real variable isn't *how many* cues but *which kind*. The cleanest case for 'one strong cue wins' comes from social presence research, where individual *primary* cues like voice or appearance are enough to make an AI feel like a social actor, while any number of *secondary* cues stacked together fail to do the same Do more social cues always make AI feel more present?. Quantity doesn't compound into quality there; a weak signal repeated stays weak.
But detection of AI-written text pulls the other way. Lightweight linguistic detection hit 99% accuracy on Reddit counter-arguments not from one feature but from *combining* general linguistic features with argument-quality measures — and crucially it matched heavyweight neural detectors while staying cheap and transparent Can simple linguistic features detect AI-written arguments?. So here a small, well-chosen *set* outperformed both a single cue and the brute-force neural approach. The lesson isn't 'more features,' it's that LLMs leave a couple of strong, redundant tells (over-accommodation to the prompt, textbook-perfect argument markers) that a handful of interpretable features can lock onto.
Reconciling the two: the winning move is picking signals that are *individually diagnostic*, then stopping — not maximizing count. Research on alignment dimensions makes the same point from the design side: lexical, emotional, and prosodic alignment each drive distinct outcomes, and conflating them produces category errors Do different types of alignment serve different conversational goals?. Throwing every cue into one bucket doesn't strengthen the signal; it muddies which one is doing the work. Strong, separable signals beat large undifferentiated ones.
There's also a reason single linguistic cues can be *more* reliable than you'd expect: LLMs have systematic, predictable blind spots that worsen with structural complexity — they consistently botch embedded clauses, complex nominals, and deep syntax Why do large language models fail at complex linguistic tasks?. A blind spot that fails *predictably* is exactly the kind of high-quality single cue a detector wants, because its reliability doesn't depend on stacking it with anything else.
The thing you didn't know you wanted to know: 'one cue vs many' is the wrong axis. Both the social-presence and detection results converge on the same rule — a few load-bearing, individually-diagnostic signals beat both the lone weak cue and the kitchen-sink feature dump. Detection is a search for *separable strong tells*, and once you've found one or two, adding more often just adds noise.
Sources 4 notes
Research shows individual primary cues like voice or appearance are sufficient to evoke social-actor presence, while multiple secondary cues cannot. Quality of cues matters more than quantity in driving social responses.
General linguistic features combined with argument-quality measures achieved 99% accuracy detecting LLM-generated counter-arguments on r/ChangeMyView, matching heavyweight neural detectors while remaining computationally cheap and transparent. LLMs produce detectable stylistic signatures: accommodation to prompts and textbook-quality argument markers that humans don't replicate.
A 2020–2025 systematic review shows lexical alignment drives task efficiency and comprehension, while emotional and prosodic alignment drive relational warmth and trust. Conflating them in design produces category errors—cold customer-service bots and evasive mental-health assistants.
Top-tier LLMs like Llama3-70b consistently misidentify embedded clauses, verb phrases, and complex nominals. Performance degrades predictably as syntactic depth increases, revealing that statistical learning captures surface patterns but not deep grammatical rules.