Why do LLM meeting summaries fail to help individuals?

Current LLM summarization treats all meeting participants the same, but organizational contexts require personalized recaps. What barriers prevent systems from learning what matters to each person?

Synthesis note · 2026-02-23 · sourced from Reading Summarizing

LLM-based dialogue summarization shows promise for meeting recap — but a user study with seven participants evaluating real work meetings reveals three specific failure modes that prevent organizational adoption.

The personal relevance gap. LLM recap summarizes what was globally important in the meeting, not what was personally relevant to each participant. A designer cares about the design decisions made. A project manager cares about timeline commitments. The same meeting requires different summaries for different participants, and current summarization has no model of what matters to whom. This is the personalization problem applied to collaborative settings — since Do user outputs outperform inputs for LLM personalization?, the system would need to learn from each participant's interaction history what they care about.

The mis-attribution problem. When the system attributes a statement to the wrong participant, the consequences extend beyond simple factual error. Mis-attributions are detrimental to group dynamics — they can create false impressions about who committed to what, who raised which concern, or who proposed which idea. In organizational settings where credit, accountability, and trust are at stake, getting attribution wrong damages the social fabric the meeting was meant to build. This parallels the finding that Does warmth training make language models less reliable? — errors in social contexts have consequences that accuracy metrics don't capture.

Context-dependent representation. Two distinct recap formats serve different needs: "highlights" (important moments, key decisions) for quick scanning and cognitive efficiency, and "hierarchical minutes" (structured, ordered, detailed) for reference and alignment. The rationale comes from cognitive science — perception and recall operate differently, and one format cannot serve both. Since Do generated interfaces outperform text-based chat for most tasks?, the representation should adapt to context rather than defaulting to a single format.

The design implication: AI summarization in collaborative organizational settings must learn from natural interactions what matters to each participant. Pure content summarization — extracting "what happened" — is insufficient when the question is "what happened that matters to me."

Inquiring lines that use this note as a source 3

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

16 direct connections · 136 in 2-hop network ·medium cluster Open in graph ↗

Why do LLM meeting summaries fail to help indivi… Do user outputs outperform inputs for LLM personal… Does warmth training make language models less rel… Do generated interfaces outperform text-based chat… Why do AI agents miss most of what users actually …

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do user outputs outperform inputs for LLM personalization? Does a user's history of outputs (responses, endorsed content) matter more for personalization than their input queries? This explores what actually drives effective personalization in language models.
the mechanism for learning personal relevance
Does warmth training make language models less reliable? Explores whether training models for empathy and warmth creates a hidden trade-off that degrades accuracy on medical, factual, and safety-critical tasks—and whether standard safety tests catch it.
errors in social contexts have invisible consequences
Do generated interfaces outperform text-based chat for most tasks? Explores whether LLMs should create interactive UIs instead of text responses, and under what conditions users prefer dynamic interfaces to traditional conversational chat.
adaptive representation over fixed format
Why do AI agents miss most of what users actually want? UserBench explores why current models align with user intent only 20% of the time, even when users reveal preferences across multiple turns. The question examines whether agents can learn to actively clarify ambiguous or evolving goals.
the intent alignment gap applied to organizational settings

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

llm meeting summaries fail on personal relevance and speaker attribution — mis-attributions harm group dynamics in organizational settings

Why do LLM meeting summaries fail to help individuals?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 5