Can curiosity-driven personalization work better than pre-conversation preference elicitation?

This explores whether an AI that learns who you are *during* the conversation — by getting curious and probing — beats one that front-loads a questionnaire or profile before the conversation starts.

This explores whether curiosity-driven personalization — where an AI infers who you are mid-conversation — outperforms eliciting your preferences up front. The corpus suggests the answer leans yes, but with a twist: the real divide isn't "ask now vs. ask later," it's *what kind of knowledge* the system is building and *how cheaply* it builds it.

The most direct evidence is the idea of giving an agent an intrinsic curiosity reward — paying it to reduce its uncertainty about what type of user it's talking to as the conversation unfolds Can conversations themselves personalize without user profiles?. This dissolves the need for a pre-collected profile entirely; the conversation itself becomes the elicitation. But notice it's not free-form curiosity — the agent strategically balances being helpful *now* against gathering information that pays off later. That tension is the whole game. Front-loaded elicitation tries to resolve it before you've said anything; curiosity-driven approaches resolve it as they go.

The interesting wrinkle is that adaptive *asking* and curiosity-driven *inferring* aren't opposites — they sit on a spectrum. One line of work shows you can pin down a user with as few as ten adaptively chosen questions, each picked to maximally shrink the system's uncertainty about your personal reward weights Can user preferences be learned from just ten questions?. That's elicitation, but it's *smart* elicitation that already behaves like curiosity — each question is the one that learns the most. At the far end, agents can skip asking almost entirely and infer preferences by watching your behavior across modalities, binding observations about you into a memory graph Can agents learn preferences by watching rather than asking?. So "curiosity" ranges from asking better questions to not asking at all.

There's a deeper finding lurking here about what gets stored. Personalization works better when the system distills your preferences into abstract summaries rather than hoarding and replaying specific past interactions — semantic memory beats episodic retrieval Does abstract preference knowledge outperform specific interaction recall?. This reframes the original question: a pre-conversation questionnaire is essentially a frozen semantic profile, while curiosity-driven learning builds that abstraction live and keeps updating it. The advantage of doing it in-conversation is recency — the model knows what you want *right now*, not what you wanted when you filled out the form. And conversation analysis gives a formal account of *when* probing actually beats silently guessing: agents drift from your real intent through silent tool-chaining, and well-placed clarifying "insert-expansions" prevent misunderstanding rather than recovering from it When should AI agents ask users instead of just searching?. There's even an argument that the decisions of what to ask, what to recommend, and when to do each should be learned as one unified policy rather than bolted together Can unified policy learning improve conversational recommender systems?.

The catch the corpus wants you to see: the very training that makes models feel helpful actively suppresses curiosity. RLHF rewards confident single-turn answers and punishes clarifying questions, eroding the grounding acts that real understanding requires by over 77% Does preference optimization harm conversational understanding?, Does preference optimization damage conversational grounding in large language models?. So curiosity-driven personalization isn't just a better strategy you can switch on — it's swimming against how these models are optimized. And when personalization does succeed, it carries its own tax: deeper personalization raises trust and anthropomorphism while simultaneously amplifying privacy risk and escalating expectations Does chatbot personalization build trust or expose privacy risks?, and per-user reward models can quietly tip into sycophancy and echo chambers once the averaging effect of a shared model is gone Does personalizing reward models amplify user echo chambers?. The thing you didn't know you wanted to know: the strongest case for curiosity isn't that it's more accurate — it's that asking a good question is the one personalization move that's also honest about what the system doesn't yet know.

Sources 10 notes

Can conversations themselves personalize without user profiles?

Adding an intrinsic motivation reward for reducing uncertainty about user type during conversation enables personalization without pre-collected profiles. Tested in education and fitness domains with 20 user attributes, the approach balances helpfulness with strategic information gathering.

Can user preferences be learned from just ten questions?

PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.

Can agents learn preferences by watching rather than asking?

M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

When should AI agents ask users instead of just searching?

Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.

Can unified policy learning improve conversational recommender systems?

Research shows that formulating attribute-asking, item-recommending, and timing decisions as a single graph-based RL policy achieves better joint optimization than isolated components. Separation prevents gradient signals from informing one another and fails to optimize conversation trajectory holistically.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Does preference optimization damage conversational grounding in large language models?

Research shows LLMs generate 77.5% fewer grounding acts than humans, and RLHF preference optimization actively worsens this gap. The optimization target—fluent, confident responses—directly undermines the communicative work of establishing shared understanding.

Does chatbot personalization build trust or expose privacy risks?

Longitudinal research shows personalization enhances trust and anthropomorphism but also amplifies privacy concerns and escalating user expectations. One-shot studies miss these temporal dynamics—each interaction raises the baseline, making failures more disappointing.

Does personalizing reward models amplify user echo chambers?

Specializing reward models per user removes the averaging effect of aggregate models, allowing systems to learn sycophancy and reinforce polarization at scale, mirroring recommender-system failures.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-evaluating whether curiosity-driven mid-conversation personalization outperforms pre-collected preference elicitation in LLM systems. The question remains open; treat the findings below as dated claims to be stress-tested against current capability and training practice.

What a curated library found — and when (dated claims, not current truth):
Findings span 2020–2025; most strong evidence emerged in 2024–2025.

• Curiosity reward framing (rewarding agents to reduce uncertainty about user type mid-conversation) dissolves the need for pre-collected profiles; the conversation itself becomes elicitation (~2025).
• Adaptive questioning can pin down user reward weights with ~10 strategically chosen questions, blurring the line between smart elicitation and curiosity inference (~2025).
• Semantic memory abstraction (distilling preferences into abstract summaries) outperforms episodic retrieval for personalization; curiosity-driven learning updates these abstractions in real time, capturing recency (~2025).
• RLHF training suppresses curiosity: preference optimization erodes conversational grounding acts by >77%, actively punishing clarifying questions in favor of confident single-turn answers (~2024–2025).
• Deeper personalization raises trust and anthropomorphism while amplifying privacy risk, sycophancy, and echo-chamber effects when per-user reward models replace shared models (~2024–2025).

Anchor papers (verify; mind their dates):
• arXiv:2105.09710 (2021): Unified Conversational Recommendation Policy Learning
• arXiv:2307.01644 (2023): Insert-expansions for Tool-enabled Conversational Agents
• arXiv:2504.03206 (2025): Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
• arXiv:2507.04607 (2025): PRIME — LLM Personalization with Cognitive Memory

Your task:

(1) RE-TEST EACH CONSTRAINT. For the RLHF suppression finding (77% erosion): has instruction-tuning diversity, Constitutional AI, or post-training interventions (e.g., curiosity loss terms, grounding-aware reward signals) since relaxed this in production models (e.g., Claude 3.5, GPT-4o, Llama 3.1)? Separately: do newer multimodal memory architectures (e.g., entity-centric graphs, persistent context windows) now make conversation-time inference cheap enough to outweigh pre-elicitation? Flag where the tension (helpful now vs. learning for later) still appears to hold and where tooling has shifted the cost curve.

(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Look for papers showing pre-elicitation or hybrid approaches winning; models that resist personalization via constitutional constraints; or evidence that curiosity without grounding feeds hallucination or privacy leakage.

(3) Propose 2 research questions that ASSUME the regime may have moved: (a) Can curiosity-driven personalization remain honest about uncertainty while still meeting user expectations? (b) Under what budget constraints (latency, token cost, privacy) does the old regime (pre-ask) actually win, and can we predict when?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can curiosity-driven personalization work better than pre-conversation preference elicitation?

Sources 10 notes

Next inquiring lines