How can agents learn user preferences during conversation without pre-calibration?

This explores how an agent can figure out what a user wants on the fly — mid-conversation, with no profile, survey, or training pass done in advance — and the corpus turns out to disagree productively about whether the move is to ask, to watch, or to remember.

This explores how an agent can figure out what a user wants on the fly — mid-conversation, with no profile or survey done in advance. The cleanest answer in the corpus reframes the problem as curiosity: instead of loading a profile, give the agent an intrinsic reward for *reducing its own uncertainty* about who it's talking to. Can conversations themselves personalize without user profiles? shows that this single tweak lets a model personalize purely through conversation dynamics, balancing being helpful now against gathering information that pays off later. A neighboring approach, Can user preferences be learned from just ten questions?, makes the same bet more surgically: it pre-learns a set of base reward 'directions' from a population, then uses active learning to ask the *most informative* questions — roughly ten — to locate this particular user, all at inference time with no weight changes. Both treat preference-learning as fast uncertainty collapse rather than slow calibration.

But here the corpus opens a real fork: do you *ask*, or do you *watch*? Can agents learn preferences by watching rather than asking? argues you can learn preferences from continuous observation — an entity-centric memory graph that infers and acts on what you like without ever interrupting to ask. The asking camp pushes back with structure. When should AI agents ask users instead of just searching? borrows 'insert-expansions' from human conversation analysis to formalize *when* a well-placed clarifying question prevents misunderstanding rather than merely recovering from it, and Can unified policy learning improve conversational recommender systems? shows that the three sub-decisions — what to ask, what to recommend, and when to do each — work far better learned as one joint policy than as separate modules, because timing and content inform each other.

There's a quieter requirement hiding underneath all of this: the agent has to actually be willing to take initiative, and most aren't. Why can't conversational AI agents take the initiative? points out that LLMs are trained to respond, not to probe — they can't initiate a topic or strategically gather information, and alignment objectives reinforce that passivity behind fluent prose. That matters because preference-elicitation *is* an act of initiative. Could proactive dialogue make conversations dramatically more efficient? quantifies the cost of that gap: volunteering relevant information and questions can cut conversation length by up to 60%, yet this behavior is nearly absent from training data and benchmarks.

Once you've learned something mid-conversation, where does it live? Does abstract preference knowledge outperform specific interaction recall? reports a counterintuitive result: a compressed *summary* of your preferences beats replaying your past interactions verbatim, and recency beats similarity-based retrieval — so the agent that abstracts 'this user prefers terse answers' outperforms the one that hoards transcripts. Can agents learn continuously from experience without updating weights? extends this into continual learning: an agent can keep improving from experience entirely through memory operations, never touching its weights — which is exactly the no-pre-calibration ideal, just stretched across many conversations.

The thing you didn't know you wanted to know: one of the most concrete preference signals is also the most overlooked. Why don't conversational AI systems mirror their users' word choices? notes that humans unconsciously drift toward each other's word choices — a core mechanism of rapport — and current models simply don't do it, even though it can be taught. So 'learning preferences without pre-calibration' may be less about clever questioning and more about something humans do for free: noticing the words the person in front of you already chose, and matching them.

Sources 10 notes

Can conversations themselves personalize without user profiles?

Adding an intrinsic motivation reward for reducing uncertainty about user type during conversation enables personalization without pre-collected profiles. Tested in education and fitness domains with 20 user attributes, the approach balances helpfulness with strategic information gathering.

Can user preferences be learned from just ten questions?

PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.

Can agents learn preferences by watching rather than asking?

M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.

When should AI agents ask users instead of just searching?

Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.

Can unified policy learning improve conversational recommender systems?

Research shows that formulating attribute-asking, item-recommending, and timing decisions as a single graph-based RL policy achieves better joint optimization than isolated components. Separation prevents gradient signals from informing one another and fails to optimize conversation trajectory holistically.

Why can't conversational AI agents take the initiative?

Research shows LLMs including ChatGPT cannot initiate topics, plan strategically, or lead conversations because their training optimizes for responding to queries, not creating dialogue from agent goals. This passivity is reinforced by alignment objectives and masked by fluent-sounding outputs.

Could proactive dialogue make conversations dramatically more efficient?

Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Can agents learn continuously from experience without updating weights?

AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.

Why don't conversational AI systems mirror their users' word choices?

Response generation models fail to adapt vocabulary toward users' lexical choices, a phenomenon central to human rapport and clarity. Post-training via DPO on coreference-identified preferences can teach models in-context convention formation.

How can agents learn user preferences during conversation without pre-calibration?

Sources 10 notes

Next inquiring lines