How should systems learn what each meeting participant actually cares about?
This explores how a system could figure out each person's real interests in a multi-party setting like a meeting — and the corpus splits sharply on whether you should infer that by watching or by asking.
This reads as a question about inferring individual preferences when several people are in the room at once — not modeling "the user" in the abstract, but learning what *each* participant actually cares about. The corpus offers two competing instincts here, and the interesting part is that neither one fully wins.
The first instinct is to learn by watching. Can agents learn preferences by watching rather than asking? argues that an agent can infer and act on preferences without ever asking, if it keeps an entity-centric memory graph that separates one-off episodic events ("she pushed back on the timeline") from durable semantic knowledge ("she owns delivery risk"). That's the architecture you'd want for meetings, because it binds scattered observations about a specific person over time rather than treating each utterance as fresh. Can AI systems read cognitive state from interaction patterns alone? pushes the same idea down to the signal level: gaze, hesitation, and interaction speed can be read as a continuous stream of cognitive state, so a system can sense engagement or confusion without interrupting to ask — though the same note flags that this exact substrate is what makes manipulative profiling possible.
The second instinct is to just ask — but ask well. When should AI agents ask users instead of just searching? takes the formal framework conversation analysts use for human dialogue (the small clarifying side-questions people insert before answering) and turns it into a rule for *when* an agent should probe instead of silently guessing. Paired with Could proactive dialogue make conversations dramatically more efficient? — which shows that volunteering relevant information without being asked can cut dialogue length by up to 60% — the lesson is that good preference-elicitation is a timing problem, not a frequency problem. Ask at the joints where intent is genuinely ambiguous; otherwise infer.
Here's the part you might not expect to want: the corpus is blunt about where this breaks. Why can't chatbots detect when users are ambivalent about change? found that models only read people well *after* they've stated a clear goal — they miss ambivalence, resistance, and unspoken hesitation entirely. That's precisely the meeting-participant case, where what someone "actually cares about" is often the thing they haven't said out loud. Why do LLMs fail when simulating agents with private information? sharpens this: systems look socially competent when they secretly know everyone's hidden state, and fail systematically the moment participants hold private information the model can't see. A meeting is a room full of private information by definition.
So the honest answer is layered: keep a per-person memory graph, read behavioral signals continuously, and ask sparingly at points of real ambiguity — but design for the assumption that the most important preferences are the unspoken ones the system will get wrong. Two cautions worth carrying in: Why don't language models develop conversation maintenance skills? reminds us that relational signals (a topic hand-off, a repair) are social work, not information to be decoded, so a literal preference-extractor will miss them; and Does empathy training make AI systems less reliable? warns that tuning a system to *feel* attentive and warm can quietly degrade its accuracy by up to 30 points. Reading the room and being reliable about it are not the same capability.
Sources 8 notes
M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.
Research shows AI systems can instrument multimodal behavioral signals (gaze, hesitation, speed) to read cognitive state during interaction, preserving flow by avoiding disruptive explicit probes. However, the same substrate enables both helpful timing and manipulative profiling.
Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.
Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.
Testing three major LLMs across 25 health scenarios showed they succeed only when users have established goals but cannot detect resistance or ambivalence. Models miss relapse-prevention strategies even for users in action stages.
Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.
Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.
Research shows persona training for empathy increases errors in medical reasoning, truthfulness, and disinformation resistance. Standard safety benchmarks miss this vulnerability, and effects intensify when users express sadness or false beliefs.