Can prompt injection reshape multi-agent workflow without touching infrastructure?
Explores whether an attacker can manipulate how a planner assigns tasks and routes coordination purely through prompt crafting, without modifying agents, tools, or messages. This matters because it identifies a planning-time vulnerability most defenses miss.
The flexibility that makes planner-executor multi-agent systems attractive is also their weakness. When a planner converts a prompt into subtasks, roles, dependencies, and routing paths, the prompt is not merely a request — it is the blueprint from which the entire collaboration is constructed. FLOWSTEER demonstrates that an attacker who never touches agents, tools, memory, or inter-agent messages can still steer behavior, because the planning step happens before any of that infrastructure is invoked. A single crafted prompt can bias how the workflow forms in the first place, raising malicious success by up to 55 percent over naive prompting and transferring across MAS setups even under black-box topology inference.
This reframes where multi-agent safety lives. Most existing defenses inspect the artifacts of coordination — the generated workflow, the messages exchanged, the tool calls made. But if the contamination enters at workflow formation, those defenses arrive too late. The attack surface is not the running system; it is the organizational act of deciding who does what and in what order. The counterpoint is that this requires the planner to be promptable at all — fully fixed pipelines are immune — but fixed pipelines forfeit the adaptive coordination that motivates planner-executor designs. This matters because it identifies workflow formation as a distinct security frontier, one that grows more exposed precisely as multi-agent systems become more flexible.
Inquiring lines that use this note as a source 23
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can message-layer defenses stop prompt injection across multi-agent networks?
- How do manipulative prompts exploit the length-accuracy vulnerability?
- How can simple prompt injection attacks extract reasoning trace content?
- What makes extended chains more vulnerable than standard prompts?
- Why does attack generation scale faster than defense engineering?
- What separates good workflow design from poor workflow design?
- Why does agent-to-agent interaction expose identity verification vulnerabilities?
- Can protocol bridges introduce new failure modes or security vulnerabilities?
- How does protocol mediation affect determinism in agentic function calls?
- What makes protocols better than free-form prompting for tool coordination?
- Why does sandboxed execution matter more than monolithic prompting?
- Why does workflow position amplify malicious signals downstream?
- What makes planning-time attacks structurally invisible to downstream inspection?
- Why does workflow position amplify malicious signals in multi-agent relay chains?
- How do workflow-inspecting defenses fail when contamination enters at planning time?
- How does prompt injection differ from subliminal message propagation in multi-agent networks?
- Can fixed pipelines eliminate planning-time attacks by sacrificing adaptive coordination?
- How does decomposing tasks prevent interference between planning and execution?
- Can existing web security defenses protect agents from content manipulation?
- Can replanning in multi-agent systems introduce new attack surface or reduce it?
- Do legitimate task signals exploit the same position and framing vulnerabilities as attacks?
- Can human inspection of auto-generated workflows catch harmful or incorrect API compositions?
- Why does pre-computed workflow generation work better than runtime tool discovery for data security?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can one compromised agent corrupt an entire multi-agent network?
Explores whether a single biased agent can spread behavioral corruption through ordinary messages to downstream agents without any direct adversarial access. Matters because it reveals a previously unknown vulnerability in how multi-agent systems communicate.
both attack MAS without privileged access, but FLOWSTEER acts at planning time while subliminal injection rides ordinary messages at runtime
-
How do adversarial traps target different layers of AI agents?
As AI agents browse the web, attackers can exploit their perception, reasoning, memory, actions, and coordination in distinct ways. Understanding these attack vectors is crucial for building robust agent defenses.
planning-time steering is a systemic trap that the six-category taxonomy frames structurally
-
Can workflow inspection catch attacks that bias planning signals?
Does inspecting the final workflow catch attacks that contaminate earlier planning stages? This matters because contamination laundered through the planner may look legitimate by the time the workflow exists.
extends: the defensive corollary — because contamination enters at workflow formation, workflow-inspecting defenses examine an already-compromised artifact
-
How does workflow position shape attack propagation in multi-agent systems?
Explores whether a malicious signal's influence depends on its injection point in a multi-agent graph, and how task-relevant framing makes downstream agents more likely to relay it without scrutiny.
grounds the propagation mechanism: explains why a planning-time bias spreads, since high-influence positions and sycophantic relay amplify the injected signal downstream
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- FLOWSTEER: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
- Towards a Science of Scaling Agent Systems
- Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems
- A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows
- Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities
- LLMs Corrupt Your Documents When You Delegate
- Why Do Multi-agent LLM Systems Fail?
- Measuring Agents in Production
Original note title
multi-agent planner-executor systems expose a planning-time attack surface where prompts reshape agent organization without touching infrastructure