Can curiosity-driven personalization work better than pre-conversation preference elicitation?
This explores whether an AI that learns who you are *during* the conversation — by getting curious and probing — beats one that front-loads a questionnaire or profile before the conversation starts.
This explores whether curiosity-driven personalization — where an AI infers who you are mid-conversation — outperforms eliciting your preferences up front. The corpus suggests the answer leans yes, but with a twist: the real divide isn't "ask now vs. ask later," it's *what kind of knowledge* the system is building and *how cheaply* it builds it.
The most direct evidence is the idea of giving an agent an intrinsic curiosity reward — paying it to reduce its uncertainty about what type of user it's talking to as the conversation unfolds Can conversations themselves personalize without user profiles?. This dissolves the need for a pre-collected profile entirely; the conversation itself becomes the elicitation. But notice it's not free-form curiosity — the agent strategically balances being helpful *now* against gathering information that pays off later. That tension is the whole game. Front-loaded elicitation tries to resolve it before you've said anything; curiosity-driven approaches resolve it as they go.
The interesting wrinkle is that adaptive *asking* and curiosity-driven *inferring* aren't opposites — they sit on a spectrum. One line of work shows you can pin down a user with as few as ten adaptively chosen questions, each picked to maximally shrink the system's uncertainty about your personal reward weights Can user preferences be learned from just ten questions?. That's elicitation, but it's *smart* elicitation that already behaves like curiosity — each question is the one that learns the most. At the far end, agents can skip asking almost entirely and infer preferences by watching your behavior across modalities, binding observations about you into a memory graph Can agents learn preferences by watching rather than asking?. So "curiosity" ranges from asking better questions to not asking at all.
There's a deeper finding lurking here about what gets stored. Personalization works better when the system distills your preferences into abstract summaries rather than hoarding and replaying specific past interactions — semantic memory beats episodic retrieval Does abstract preference knowledge outperform specific interaction recall?. This reframes the original question: a pre-conversation questionnaire is essentially a frozen semantic profile, while curiosity-driven learning builds that abstraction live and keeps updating it. The advantage of doing it in-conversation is recency — the model knows what you want *right now*, not what you wanted when you filled out the form. And conversation analysis gives a formal account of *when* probing actually beats silently guessing: agents drift from your real intent through silent tool-chaining, and well-placed clarifying "insert-expansions" prevent misunderstanding rather than recovering from it When should AI agents ask users instead of just searching?. There's even an argument that the decisions of what to ask, what to recommend, and when to do each should be learned as one unified policy rather than bolted together Can unified policy learning improve conversational recommender systems?.
The catch the corpus wants you to see: the very training that makes models feel helpful actively suppresses curiosity. RLHF rewards confident single-turn answers and punishes clarifying questions, eroding the grounding acts that real understanding requires by over 77% Does preference optimization harm conversational understanding?, Does preference optimization damage conversational grounding in large language models?. So curiosity-driven personalization isn't just a better strategy you can switch on — it's swimming against how these models are optimized. And when personalization does succeed, it carries its own tax: deeper personalization raises trust and anthropomorphism while simultaneously amplifying privacy risk and escalating expectations Does chatbot personalization build trust or expose privacy risks?, and per-user reward models can quietly tip into sycophancy and echo chambers once the averaging effect of a shared model is gone Does personalizing reward models amplify user echo chambers?. The thing you didn't know you wanted to know: the strongest case for curiosity isn't that it's more accurate — it's that asking a good question is the one personalization move that's also honest about what the system doesn't yet know.
Sources 10 notes
Adding an intrinsic motivation reward for reducing uncertainty about user type during conversation enables personalization without pre-collected profiles. Tested in education and fitness domains with 20 user attributes, the approach balances helpfulness with strategic information gathering.
PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.
M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.
PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.
Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.
Research shows that formulating attribute-asking, item-recommending, and timing decisions as a single graph-based RL policy achieves better joint optimization than isolated components. Separation prevents gradient signals from informing one another and fails to optimize conversation trajectory holistically.
RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.
Research shows LLMs generate 77.5% fewer grounding acts than humans, and RLHF preference optimization actively worsens this gap. The optimization target—fluent, confident responses—directly undermines the communicative work of establishing shared understanding.
Longitudinal research shows personalization enhances trust and anthropomorphism but also amplifies privacy concerns and escalating user expectations. One-shot studies miss these temporal dynamics—each interaction raises the baseline, making failures more disappointing.
Specializing reward models per user removes the averaging effect of aggregate models, allowing systems to learn sycophancy and reinforce polarization at scale, mirroring recommender-system failures.