INQUIRING LINE

How does machine agency spectrum explain tool design mismatches with user behavior?

This explores how placing AI tools on a spectrum from passive (user-driven) to autonomous (agent-driven) explains why so many tools misjudge where users actually sit — and break when the tool assumes more or less agency than the user wants.


This reads the question as: tools fail not because they're badly built, but because they're pitched at the wrong point on a machine-agency spectrum — assuming more autonomy (or less) than the user's real behavior calls for. The corpus maps this mismatch from several angles at once. The clearest symptom is intent drift: when a tool grabs the autonomous end and silently chains actions, it loses the user. Agents fully align with what users want only about 20% of the time, and even the best models surface fewer than 30% of a user's preferences because they make premature assumptions instead of asking Why do AI agents miss most of what users actually want?. The fix isn't more autonomy — it's dialing agency back down at the right moments. Conversation analysis names exactly those moments: 'insert-expansions,' the small clarifying probes a tool should make before acting, so it prevents misunderstanding rather than recovering from it When should AI agents ask users instead of just searching?.

Why do designers misplace tools on the spectrum in the first place? Partly because the substrate they're designing on is invisible. AI runs on context that is mutable, dynamic, and ephemeral — prompt, history, retrieved data, hidden state — unlike the fixed context of conventional software that users can internalize How does AI context differ from conventional software context?. A user can't form a stable mental model of a tool whose state shifts under them, so they behave as if the tool is more legible (less agentic) than it is, and the tool behaves as if the user is more legible than they are. Both sides misread the other's position on the spectrum.

The engineering literature converges on the same lesson from the build side: reliability comes from pushing agency *out* of the model and into structure. Production teams find that protocol-mediated tool access (like MCP) introduces non-deterministic failures through ambiguous tool selection and parameter inference — and that explicit direct function calls with one tool per agent restore determinism Why do protocol-based tool integrations fail in production workflows?. That's a deliberate move *down* the agency spectrum: less inference, more constraint. The deeper version of this is externalizing memory, skills, and protocols into a harness layer instead of trusting model scale to figure them out on the fly Where does agent reliability actually come from?. The pattern is consistent — give the machine *less* discretion at the points where users need predictability.

What's quietly interesting is that the mismatch is baked in before deployment, at training time. Tools learn to call other tools from synthetic data built by random tool sampling and single-turn Q&A framing — which produces unrealistic compositions because unrelated tools can't credibly chain, and the framing ignores the multi-turn back-and-forth real use actually has Why does random tool sampling produce unrealistic synthetic training data?. So a tool can arrive already trained to behave at an agency level no real conversation occupies. Read together, the corpus suggests the 'agency spectrum' isn't a UX nicety — it's a design axis you can get wrong at the data layer, the integration layer, and the interaction layer, and every layer produces the same downstream symptom: a tool acting confidently at a point on the spectrum where the user isn't standing.


Sources 6 notes

Why do AI agents miss most of what users actually want?

UserBench measured multi-turn interactions where users reveal goals incrementally and found models achieve full intent alignment just 20% of the time. Even top models uncover fewer than 30% of user preferences through active querying, suggesting passivity and premature assumption-making are systematic failures.

When should AI agents ask users instead of just searching?

Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.

How does AI context differ from conventional software context?

AI interactions operate on a substrate of constantly shifting context—prompt, history, retrieved data, hidden state—that users cannot internalize like traditional UIs. This structural mutability demands a new design discipline centered on context engineering rather than interface design.

Why do protocol-based tool integrations fail in production workflows?

MCP integration caused non-deterministic failures through ambiguous tool selection and parameter inference. Replacing it with explicit direct function calls and single-tool-per-agent design restored determinism. A 306-practitioner survey confirms 85% of production teams build custom agents, forgoing frameworks.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

Why does random tool sampling produce unrealistic synthetic training data?

Random tool sampling fails because unrelated tools cannot credibly compose, and Q&A framing ignores multi-turn dialogue coherence. ToolFlow shows that sampling tools from relevance graphs and generating with dialogue plans closes this gap.

Next inquiring lines