SYNTHESIS NOTE

Can workflow inspection catch attacks that bias planning signals?

Does inspecting the final workflow catch attacks that contaminate earlier planning stages? This matters because contamination laundered through the planner may look legitimate by the time the workflow exists.

Synthesis note · 2026-05-28 · sourced from Agents Multi Architecture

A defense can only catch what it can see, and where it looks determines what it can catch. Because FLOWSTEER biases the planning signals from which the workflow is generated, any defense that inspects only the resulting workflow examines an artifact that is already compromised. The malicious intent has been laundered through the planner into legitimate-looking roles, dependencies, and routing — by the time the workflow exists, the contamination is no longer visibly malicious. This is why the paper introduces FLOWGUARD as an input-side defense: it strengthens the planning boundary by separating task, methodological, and framing intents, then reframes workflow-contaminating cues while preserving the original task objective, reducing malicious success by up to 34 percent without degrading prompt utility.

The general principle is about defense placement, not defense strength. Moving inspection upstream — to the point where intent is parsed but before organization is committed — catches a class of attack that downstream inspection structurally cannot. The counterpoint is that input-side defense risks false positives that suppress legitimate methodological guidance, which is exactly why FLOWGUARD separates intent types rather than filtering wholesale. This matters because it reframes MAS security as a question of where the trust boundary sits: the safest place to intervene is the boundary between instruction and organization, not the organization itself.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 100 in 2-hop network ·medium cluster Open in graph ↗

Can workflow inspection catch attacks that bias … Can we defend RAG systems from corpus poisoning wi… How do adversarial traps target different layers o… Can prompt injection reshape multi-agent workflow … How does workflow position shape attack propagatio…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can we defend RAG systems from corpus poisoning without retraining? Explores whether retrieval-time defenses can catch and block poisoned documents before they reach the generator, without expensive retraining cycles. Matters because corpus updates outpace model retraining in production RAG systems.
parallel principle that the right defense sits upstream of where the harm becomes visible
How do adversarial traps target different layers of AI agents? As AI agents browse the web, attackers can exploit their perception, reasoning, memory, actions, and coordination in distinct ways. Understanding these attack vectors is crucial for building robust agent defenses.
locating defenses depends on which trap category an attack belongs to
Can prompt injection reshape multi-agent workflow without touching infrastructure? Explores whether an attacker can manipulate how a planner assigns tasks and routes coordination purely through prompt crafting, without modifying agents, tools, or messages. This matters because it identifies a planning-time vulnerability most defenses miss.
same FLOWSTEER work; names the planning-time attack surface that this note argues downstream workflow inspection structurally cannot see
How does workflow position shape attack propagation in multi-agent systems? Explores whether a malicious signal's influence depends on its injection point in a multi-agent graph, and how task-relevant framing makes downstream agents more likely to relay it without scrutiny.
explains the propagation mechanism that makes upstream contamination look legitimate by the time it reaches the workflow this note says is inspected too late

Can workflow inspection catch attacks that bias planning signals?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 5