Does better summary writing actually increase user engagement?

When AI systems generate more informative push notifications, do users engage more? This explores whether informativeness and engagement always align in real product contexts.

Synthesis note · 2026-02-23 · sourced from Social Media

LLM-generated summaries for social network push notifications were objectively more informative and customized than existing templates. They did not improve user engagement. The explanation is structural, not quality-related: a well-summarized notification body contains sufficient information that users do not need to open the notification to understand the content. The optimization target (informativeness) directly undermines the business metric (engagement/clicks).

This is an instance of Goodhart's Law operating through content quality: when you optimize for how informative a message is, you can succeed at informativeness while failing at the behavior the informativeness was supposed to drive. The information was meant to entice users to engage; instead, it satisfied their information need at the notification level.

Two compounding factors emerged from the experiments:

Voice alienation: LLM summarization transformed first-person user voice ("I'm looking for a plumber") into third-person reportage ("neighbor asks about plumbers"). This tonal shift alienated recipients by creating distance from the original social context. The content was more polished but less relational — it sounded like a news brief about a neighbor rather than a neighbor reaching out.

Optimization gap: Without a reward model specifically trained for engagement, or specific model training to tailor user preferences into content generation, in-context learning alone cannot shortcut established templates that have been iteratively refined over years. The control templates were the product of multiple iteration cycles; the LLM-generated alternatives were one-shot productions. Even when LLMs produce "better" content by linguistic quality metrics, they cannot automatically improve engagement metrics that require alignment with user behavioral patterns.

The broader pattern: LLM-generated content is best suited for rapid prototyping of new products but directly using it to improve metrics on mature products that have undergone years of A/B testing often fails. The same dynamic appeared in invitation emails — more informative, more personalized, but not more effective at driving sign-ups. Generic LLM-generated content cannot capture individual personal preferences without further training.

This connects to the alignment tax discussion: since Does preference optimization harm conversational understanding?, we see a parallel where optimizing for one communication quality (informativeness) erodes the behavioral outcome it was meant to serve (engagement). The mechanism differs — RLHF erodes grounding acts while informativeness optimization eliminates click-through motivation — but the pattern is the same: optimizing a proxy metric degrades the downstream target.

Inquiring lines that use this note as a source 7

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 162 in 2-hop network ·dense cluster Open in graph ↗

Does better summary writing actually increase us… Does preference optimization harm conversational u… Can we measure reading efficiency as a quality met… Do language models generate more novel research id…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does preference optimization harm conversational understanding? Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
parallel pattern: optimizing for one communication quality undermines the broader communicative goal
Can we measure reading efficiency as a quality metric? How can we quantify whether generated text delivers novel information efficiently or wastes reader attention through redundancy? This matters because standard coherence and fluency scores miss texts that are well-written but informationally dense.
high knowledge density in summaries may be the mechanism: too much information per token eliminates the curiosity gap
Do language models generate more novel research ideas than experts? Explores whether LLMs can break free from expert constraints to generate more novel research concepts. Matters because novelty is often thought to be AI's creative blind spot.
parallel dissociation: higher quality on one dimension doesn't translate to effectiveness on the actual goal dimension

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

more informative AI-generated content paradoxically reduces user engagement because informational sufficiency eliminates the need to click through

Does better summary writing actually increase user engagement?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4