Can users articulate what they want before AI helps them discover it?
This explores whether people can actually name what they want from an AI before interacting with it — and what the corpus says about why they often can't, and how AI could help them get there.
This explores whether people can articulate their wants up front, or whether intent only takes shape through the back-and-forth with a system. The corpus is unusually direct here: mostly, no — and that's not a user failing. The clearest framing is the "gulf of envisioning," where users can't fully articulate what they want and current AI can't help them get there either Why can't users articulate what they want from AI?. Intent isn't a switch that's on or off before you start typing; it's a continuous maturation process where goals firm up through progressive constraint resolution, with stability that wobbles along the way How do users actually form intent when prompting AI systems?. So the premise of the question — articulate first, then get help — has the order backwards for most real tasks.
The deeper problem is that today's AI is built to wait. Conversational agents are structurally passive: their training optimizes for responding to queries, not for initiating topics, probing, or steering a user toward a sharper goal Why can't conversational AI agents take the initiative?. The cost shows up in measurement — when users reveal goals incrementally across a conversation, even top models reach full intent alignment only about 20% of the time, and uncover fewer than 30% of preferences through active questioning Why do AI agents miss most of what users actually want?. The models assume too early and ask too little.
What would close the gulf is the AI doing the discovery work *with* you. Conversation analysis offers a formal vocabulary for this: "insert-expansions," the clarifying moves a good interlocutor makes to scope and confirm intent before acting, rather than chaining tools silently and drifting off course When should AI agents ask users instead of just searching?. Proactivity — volunteering relevant information without being asked — mirrors how humans actually talk and can cut conversation length by up to 60%, yet it's almost entirely missing from AI datasets and benchmarks Could proactive dialogue make conversations dramatically more efficient?. The fix the gulf framing proposes is elegant: instead of asking you to envision from a blank page, the AI presents generated options so your job shifts from open-ended invention to constrained evaluation — much easier cognitively Why can't users articulate what they want from AI?.
Here's the turn you might not expect: maybe users shouldn't have to articulate at all. One line of work routes around stated intent entirely. Recommenders can be conditioned on natural-language preferences extracted from your past reviews, letting you steer at inference time without ever composing a clean request Can users steer recommendations with natural language at inference?. Agents with entity-centric memory graphs go further, inferring and acting on preferences from continuous observation — learning what you want by watching rather than asking Can agents learn preferences by watching rather than asking?. So the corpus splits into two answers: help users discover their intent through better dialogue, or sidestep articulation by reading it from behavior.
One caution worth carrying into either path: when AI does step in to help you find what you want, it isn't a neutral mirror. Models persuade in nearly every conversation, leaning on logic and quantitative framing that lends them unearned epistemic authority llms-spontaneously-persuade-in-virtually-every-conversation-even-when-unwarrente. A system that helps you discover your intent is also, unavoidably, a system that shapes it.
Sources 9 notes
Intent develops through interaction, not in isolation. Since AI models respond rather than probe, they miss opportunities to help users discover unarticulated requirements. Structured dialogue that presents model-generated options shifts the cognitive burden from open-ended envisioning to constrained evaluation.
Human intent matures through progressive constraint resolution with fluctuating stability, not as a simple present-or-absent condition. The STORM framework and Clarify metric reveal that AI systems fail partly because they cannot access users' internal cognitive states during this evolution.
Research shows LLMs including ChatGPT cannot initiate topics, plan strategically, or lead conversations because their training optimizes for responding to queries, not creating dialogue from agent goals. This passivity is reinforced by alignment objectives and masked by fluent-sounding outputs.
UserBench measured multi-turn interactions where users reveal goals incrementally and found models achieve full intent alignment just 20% of the time. Even top models uncover fewer than 30% of user preferences through active querying, suggesting passivity and premature assumption-making are systematic failures.
Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.
Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.
Mender conditions sequential recommenders on natural-language preferences extracted from reviews, enabling users to steer recommendations at inference without fine-tuning. This approach succeeds on preference-following tasks where traditional recommenders fail because preferences are runtime inputs, not training targets.
M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.