Why can't users articulate what they want from AI?
Explores the cognitive gap between imagining possibilities and expressing them as prompts. Why language interfaces create a harder envisioning task than traditional UI affordances.
Post angle for Medium/LinkedIn
AI can answer any question you can think to ask. The problem is that you often can't think of the right question.
STORM calls this the "gulf of envisioning" — the cognitive difficulty users face in simultaneously imagining what's possible and expressing it as a prompt. Unlike conventional interfaces with predictable affordances (buttons, menus, forms), language interfaces require users to envision possibilities and their expressions at the same time. This is a fundamentally harder cognitive task.
The double gap:
On the USER side: intent is not a thing you HAVE — it's a thing that MATURES through interaction. You start with a vague sense ("I want to plan a trip"), constraints resolve progressively ("somewhere warm, in February, under $3000"), stability fluctuates (new information destabilizes), and structural signals you're not even aware of (implicit assumptions, cultural markers) carry meaning you can't articulate.
On the AI side: since Why can't advanced AI models take initiative in conversation?, models are trained to respond to what you say, not to help you figure out what to say. They treat your intent as a binary state (present or absent) rather than a maturation process. They cannot detect that your expression hasn't reached cognitive readiness for system action.
The convergence of three research programs:
STORM — formalizes intent as continuous maturation with the "Clarify" metric measuring internal cognitive improvement. Users may express satisfaction while internally confused about their own needs.
Insert-expansions from CA — provides the interaction framework: when AI can't immediately answer, it should probe the user (clarify intent, scope response) rather than silently chain tool calls and diverge. The "user-as-a-tool" paradigm.
Decision-oriented dialogue — formalizes the information asymmetry: user knows preferences, AI has database, neither can share everything. Success requires determining what information is decision-relevant.
The design implication: This is not a model capability problem to be solved by better models. It's a design problem requiring fundamental changes to how AI interactions are structured. The fix isn't a smarter answer — it's a better conversation about what the question should be.
The hook: AI can answer any question. The problem is that you often can't think of the right question — and AI can't help you get there.
Conversational Prompt Engineering (CPE) demonstrates a partial bridge. A three-party system (user, system, model) where the LLM generates data-driven questions from user-provided unlabeled data, uses responses to shape an initial instruction, then shares outputs and uses feedback to refine both instruction and outputs. The key insight: the model's ability to analyze data and suggest "dimensions of potential output preferences" helps users discover requirements they couldn't initially articulate. However, CPE still requires users to evaluate outputs — the envisioning gap is narrowed by scaffolded interaction but not eliminated. This is a meaningful design finding: structured dialogue around model-generated proposals shifts the user's cognitive task from open-ended envisioning to constrained evaluation, which is significantly easier. The gulf can be narrowed not by making users better at articulating intent, but by changing what they're asked to do.
Key sources:
- Why do users drift away from their original information need? — ASK is the upstream cognitive cause of the gulf: users know their knowledge is incomplete but cannot specify what is missing, producing the vague intent that the gulf describes
- How do users actually form intent when prompting AI systems?
- When should AI agents ask users instead of just searching?
- Can AI agents communicate efficiently in joint decision problems?
- Why can't advanced AI models take initiative in conversation?
- Does user satisfaction actually measure cognitive understanding?
Inquiring lines that use this note as a source 28
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why can't users and AI articulate shared goals together?
- Can better AI interfaces eliminate the attention cost of prompt composition and evaluation?
- What makes prompt engineering different from the research thinking it replaces?
- Why does embedding evaluation criteria in prompts reduce creative scope?
- How does prompt scaffolding shift invisible labor onto the user?
- What design discipline replaces navigation and layout in AI systems?
- Can designers hide AI context complexity behind a stable user interface?
- How should designers make invisible AI state legible to users?
- Can users articulate what they want before AI helps them discover it?
- How do users fail to articulate what they actually want?
- Can generative interfaces help users articulate what they actually want?
- How does API-first interaction compare to generative interface approaches?
- Can prompt engineering overcome the gulf between user intent and AI interpretation?
- Do people with lower cognitive complexity prefer simpler machine communication goals?
- What novel goals emerge specifically in human-machine interaction beyond social ones?
- What makes complex UI navigation and social interaction harder than task completion?
- What role does language play as a cognitive scaffold versus communication tool?
- How do users develop different interaction scripts specifically for machines versus humans?
- Can users articulate their intent before exploring what an AI system finds?
- What makes evaluation easier than envisioning for users?
- What tasks do users actually want AI to handle versus what can it automate?
- How does rising AI capability change what users expect from their tools?
- What makes a possibility actionable versus merely metaphysically possible?
- Why does context work differently in AI than in conventional software?
- Can interface design scaffold human participation in tools designed for hands-off autonomy?
- How does capability differ from what workers actually want from AI?
- What stops AI from helping users articulate preferences they cannot express?
- How does context engineering bridge human intent and machine understanding?
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue
- Bridging the gulf of envisioning: Cognitive design challenges in llm interfaces.
- UserBench: An Interactive Gym Environment for User-Centric Agents
- Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation
- A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap
- Synthetic Dialogue Dataset Generation using LLM Agents
- Large Language Models for User Interest Journeys
- Thinking Assistants: LLM-Based Conversational Assistants that Help Users Think By Asking rather than Answering
Original note title
the gulf of envisioning — users cant articulate what they want and AI cant help them figure it out