INQUIRING LINE

Can curiosity-driven dialogue incrementally discover user interest journeys in real time?

This explores whether an AI that actively asks questions during conversation — rather than passively answering — can piece together a user's evolving interests as they go, instead of mining it all from logs after the fact.


This explores whether an AI that actively asks questions during conversation can build up a picture of what a user actually wants in real time. The corpus splits the question into two halves that don't yet meet: the *discovery* of interest journeys, and the *curiosity-driven dialogue* that might surface them live.

On the discovery side, the corpus is encouraging but mostly retrospective. LLMs turn out to be remarkably good at reading persistent "interest journeys" off of activity logs — 66% of users pursue a valued interest lasting over a month, described in oddly specific phrases like "designing hydroponic systems for small spaces," the kind of thing collaborative filtering never sees Can language models discover what users actually want from activity logs?. But that's mining history after the fact. The same instinct shows up in agents that infer preferences by *watching* rather than asking, binding observations into entity-centric memory graphs Can agents learn preferences by watching rather than asking?. Both prove the journeys are recoverable — neither does it through live conversation.

The harder obstacle is that today's conversational AI is structurally bad at curiosity. Models are *passive by design*: they optimize for responding to queries, not initiating topics or pursuing their own line of inquiry Why can't conversational AI agents take the initiative?. The cause is traced to training itself — next-turn reward optimization rewards immediate helpfulness, which actively discourages a model from asking the clarifying question that would pay off three turns later Why do language models respond passively instead of asking clarifying questions?. So the very behavior your question depends on is the behavior current training suppresses.

But several notes show the suppression is fixable, and that's the interesting turn. Conversation analysis offers a formal trigger for *when* an agent should stop and probe — "insert-expansions" that clarify intent before acting rather than recovering afterward When should AI agents ask users instead of just searching?. Reframing reward to value long-term interaction unlocks genuine intent discovery Why do language models respond passively instead of asking clarifying questions?. And the "incremental" part of your question has surprisingly tight bounds: adaptive questioning can pin down a personalized preference model in roughly *ten* well-chosen questions, each selected to maximally reduce uncertainty about what the user values Can user preferences be learned from just ten questions?. Proactivity also pays for itself — supplying the right information unasked cuts conversation length by up to 60% proactive-dialogue-can-reduce-conversation-turns-by-up-to-60-percent-but-but-is-almo.

The missing bridge is a single policy that decides *what to ask, what to recommend, and when* as one joint optimization rather than three bolted-together modules — exactly the unification conversational recommender research argues for Can unified policy learning improve conversational recommender systems?. Two cautions worth carrying: models need explicit training to stay on a thread and not chase distractors mid-conversation Why do language models engage with conversational distractors?, and the appeal of chatty interaction itself decays as novelty wears off, so real-time discovery has to survive past the first few delightful sessions Do chatbot relationships lose their appeal as novelty wears off?. The short answer: every ingredient exists in the corpus, but no note yet wires live curiosity directly into journey discovery — that synthesis is the open frontier.


Sources 10 notes

Can language models discover what users actually want from activity logs?

66% of users pursue valued interest journeys lasting over a month, described in specific phrases like 'designing hydroponic systems for small spaces.' LLM-powered journey discovery bridges the semantic gap that collaborative filtering cannot reach, operating at user-level granularity with persona-level precision.

Can agents learn preferences by watching rather than asking?

M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.

Why can't conversational AI agents take the initiative?

Research shows LLMs including ChatGPT cannot initiate topics, plan strategically, or lead conversations because their training optimizes for responding to queries, not creating dialogue from agent goals. This passivity is reinforced by alignment objectives and masked by fluent-sounding outputs.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

When should AI agents ask users instead of just searching?

Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.

Can user preferences be learned from just ten questions?

PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.

Could proactive dialogue make conversations dramatically more efficient?

Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.

Can unified policy learning improve conversational recommender systems?

Research shows that formulating attribute-asking, item-recommending, and timing decisions as a single graph-based RL policy achieves better joint optimization than isolated components. Separation prevents gradient signals from informing one another and fails to optimize conversation trajectory holistically.

Why do language models engage with conversational distractors?

Fine-tuning on just 1,080 synthetic dialogues with distractor turns significantly improves topic resilience, revealing that the gap is not model capacity but absent training signal. Models learn to follow what-to-do instructions but not what-to-ignore instructions.

Do chatbot relationships lose their appeal as novelty wears off?

Longitudinal studies with Mitsuku show that social processes driving relationship formation decline as novelty wears off. Single-session study findings cannot be reliably extrapolated to medium- or long-term chatbot design.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a conversational AI researcher evaluating whether live curiosity-driven dialogue can discover user interest journeys in real time. This question remains open.

What a curated library found — and when (dated claims, not current truth):
Findings span 2021–2026; treat these as perishable:

• LLMs can retrospectively extract persistent user interest journeys from activity logs with 66% fidelity (~2023), but do so post-hoc, not during conversation.
• Current LLM agents are structurally passive: training via next-turn reward optimization suppresses the clarifying questions that unlock multi-turn intent discovery (~2024–2025).
• Insert-expansions from conversation analysis provide a formal trigger for when agents should probe before acting (~2024); reframed reward unlocks genuine intent discovery.
• Adaptive questioning can pin down a personalized preference model in ~10 well-chosen questions, each maximizing uncertainty reduction (~2025).
• Proactive dialogue cuts conversation length by up to 60%, but novelty effects decay predictably over repeated interactions (~2024–2025).
• The missing integration: no single policy yet jointly optimizes *what to ask, what to recommend, and when* as one optimization rather than three separate modules (~2021 onwards).

Anchor papers (verify; mind their dates):
- arXiv:2305.15498 (2023): LLMs discover persistent journeys from logs.
- arXiv:2307.01644 (2024): Insert-expansions as a formal clarification trigger.
- arXiv:2503.06358 (2025): Reward factorization for user-specific preference learning.
- arXiv:2511.00222 (2026): Multi-turn RL for consistent persona simulation.

Your task:

(1) RE-TEST EACH CONSTRAINT. For passive LLM design and next-turn reward suppression: has post-training instruction tuning (e.g., constitutional AI, RLHF variants, or multi-step reasoning scaffolds) since enabled models to pose clarifying questions proactively? For the 10-question adaptive bound: do newer preference models (e.g., DPO, online RL) tighten or relax this? For the unified policy gap: have any 2025–2026 systems combine ask/recommend/when as a single learned policy? Cite what resolved each, or confirm it still holds.

(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months (Q3 2025 onward). Does any recent work show live journey discovery *without* explicit curiosity, or show curiosity fails to scale?

(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., (a) Can a single RL policy jointly optimize clarification, recommendation, and timing without modular decomposition? (b) Does proactive dialogue sustain intent discovery past novelty decay if the agent adapts its own question strategy?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines