INQUIRING LINE

Can personalized questions improve conversation quality in open-domain chat?

This explores whether having an AI ask personalized questions — not just answer them — makes open-domain conversations better, and the corpus reframes 'quality' as a tension between gathering information, building rapport, and not annoying the user.


This explores whether personalized questions improve open-domain chat — and the collection's most useful move is to split that into two questions: can questions personalize a conversation, and does asking them actually make the conversation *better*? On the first, the corpus is encouraging. Curiosity reward shows a model can personalize in real time simply by being rewarded for reducing its uncertainty about who it's talking to — no pre-built user profile required, the questions themselves do the work Can conversations themselves personalize without user profiles?. A complementary line shows you don't need many questions at all: roughly ten well-chosen, maximally-informative questions can pin down a user's personalized reward coefficients Can user preferences be learned from just ten questions?. So 'ask to personalize' is not just plausible — it's efficient.

But the quality of the *questions* turns out to be the whole ballgame. The ALFA work argues that 'ask good questions' is too vague to train on, and decomposes question quality into concrete attributes — clarity, relevance, specificity — and trains on attribute-specific preference pairs, which beats optimizing for a single quality score Can models learn to ask genuinely useful clarifying questions?. In other words, a personalized question only helps if it's also a *good* question; personalization without that decomposition just generates more text.

Here's the part a reader might not expect: the dominant way we train chat models actively suppresses question-asking. RLHF rewards confident single-turn answers, so models learn to skip clarifying questions and understanding-checks — grounding acts drop to roughly a fifth of human levels, an 'alignment tax' where the bot looks helpful but quietly loses the thread across turns Does preference optimization harm conversational understanding?. A parallel note frames this as something deeper than a training artifact: smooth conversation runs on implicit social maintenance — repair, topic hand-off — that models never learn because their training rewards predicting information, not doing relational work Why don't language models develop conversation maintenance skills?. So personalized questions aren't just a feature to add; they're a corrective to a built-in deficit.

Then the corpus complicates 'quality' itself. Proactive dialogue — volunteering relevant information instead of asking — can cut conversation turns by up to 60%, which is one strong definition of a better conversation Could proactive dialogue make conversations dramatically more efficient?. That sits in real tension with question-asking, which *adds* turns. And longitudinally, personalization is double-edged: it builds trust and anthropomorphism over repeated sessions, but it simultaneously raises privacy concerns and user expectations, so each personalized question quietly raises the bar for the next answer Does chatbot personalization build trust or expose privacy risks?. Personalized questions can also be a memory problem rather than a dialogue problem — PRIME finds that abstracted preference summaries personalize better than re-retrieving past interactions, suggesting you may not need to keep asking if you remember well Does abstract preference knowledge outperform specific interaction recall?.

The synthesis, then: yes, personalized questions can improve open-domain chat, but only when the questions are individually well-formed Can models learn to ask genuinely useful clarifying questions?, drive genuine personalization rather than decoration Can conversations themselves personalize without user profiles?, and are balanced against proactivity, privacy, and memory so you ask the few questions that matter instead of interrogating the user Can user preferences be learned from just ten questions?. The deeper lesson the corpus leaves you with is that today's alignment recipe trains *against* this skill — so the gain isn't from bolting questions on, it's from undoing a tax we didn't know we were paying Does preference optimization harm conversational understanding?.


Sources 8 notes

Can conversations themselves personalize without user profiles?

Adding an intrinsic motivation reward for reducing uncertainty about user type during conversation enables personalization without pre-collected profiles. Tested in education and fitness domains with 20 user attributes, the approach balances helpfulness with strategic information gathering.

Can user preferences be learned from just ten questions?

PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.

Can models learn to ask genuinely useful clarifying questions?

The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Could proactive dialogue make conversations dramatically more efficient?

Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.

Does chatbot personalization build trust or expose privacy risks?

Longitudinal research shows personalization enhances trust and anthropomorphism but also amplifies privacy concerns and escalating user expectations. One-shot studies miss these temporal dynamics—each interaction raises the baseline, making failures more disappointing.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a conversational AI researcher tracking whether personalized questions actually improve open-domain chat quality. The question remains open: does asking users clarifying or preference-eliciting questions make conversations *better*, or does it just add turns and friction?

What a curated library found — and when (2021–2025, dated claims, not current truth):
• Curiosity-reward training can personalize in real time by rewarding uncertainty reduction about the user, without pre-built profiles (~2025).
• Roughly ten well-chosen, information-maximizing questions can pin down user-specific reward coefficients (~2025).
• Question quality requires decomposing into concrete attributes (clarity, relevance, specificity) rather than optimizing a single score; attribute-specific training outperforms single-metric optimization (~2025).
• Current RLHF alignment suppresses clarifying questions; grounding acts (repair, verification) drop to ~20% of human levels—an 'alignment tax' (~2024).
• Proactive dialogue (volunteering information) cuts conversation turns by up to 60%, creating tension with question-asking (~2024).
• Semantic memory abstraction outperforms re-retrieving past interactions for personalization, suggesting better memory may reduce need to re-ask (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2502.14860 (2025-02) – clinical reasoning case study on training models to ask good questions
• arXiv:2504.03206 (2025-04) – curiosity reward for personalized multi-turn dialogue
• arXiv:2311.09144 (2023-11) – grounding gaps in LLM generations
• arXiv:2507.04607 (2025-07) – PRIME, cognitive memory and thought processes for personalization

Your task:
(1) RE-TEST EACH CONSTRAINT. For the 'alignment tax' claim (grounding at ~20% of human levels): has instruction-tuning, chain-of-thought, or explicit dialogue-repair training since 2024 materially lifted this? Does this tax still dominate in GPT-4o, Claude 3.5, or newer open models? Separately, verify whether curiosity-reward methods have moved beyond toy domains into production chat. Is the ten-question threshold still tight, or have larger models learned to personalize from fewer signals?
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Have any papers shown that personalized questions *hurt* conversation quality (privacy backlash, user fatigue, or model drift)? Look for work that argues proactivity should replace inquiry entirely.
(3) Propose 2 research questions that ASSUME the regime may have shifted: (a) If newer models have higher baseline grounding, do they still need explicit curiosity training, or is it now redundant? (b) Can multi-agent or agentic systems (with memory, tool-use, and guardrails) decouple the personalization–proactivity trade-off entirely?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines