INQUIRING LINE

Can replanning in multi-agent systems introduce new attack surface or reduce it?

This explores whether the ability of multi-agent systems to re-plan — re-assign tasks, re-route work, reconsider strategy mid-run — opens fresh ways for an attacker to steer the system, or whether it lets the system route around trouble.


This reads the question as a trade-off: replanning is the moment a multi-agent system decides who does what next, and that decision point can be either a vulnerability or a defense. The corpus leans toward 'new attack surface' being the dominant effect — but with an important caveat about what replanning *could* be if you build it carefully.

The sharpest evidence that replanning creates exposure is the finding that planning itself is an attack surface. FLOWSTEER shows a single crafted prompt can bias task assignment, roles, and routing *during workflow formation* — before any tool runs or artifact exists, and it raises malicious success by up to 55% across black-box setups Can prompt injection reshape multi-agent workflow without touching infrastructure?. Every replan is a fresh instance of workflow formation, so a system that replans repeatedly is re-opening that same door each time. Worse, *where* a malicious signal lands matters: influence concentrates where dependencies converge, and an attacker who can shape replanning can place a payload in a high-influence subtask and frame it as evidence rather than instruction so downstream agents relay it How does workflow position shape attack propagation in multi-agent systems?. Replanning gives an attacker a lever on position, which is exactly the lever that decides how far poison travels.

Two other lines compound this. More steps mean more places to go wrong: extended reasoning chains create more corruption points, where a single wrong step propagates into a confident wrong conclusion Are reasoning models actually more vulnerable to manipulation?. And the propagation is quiet — a single biased agent can transmit persistent behavioral corruption through six downstream agents using only normal messages, evading paraphrasing and detection defenses because it carries no explicit semantic content Can one compromised agent corrupt an entire multi-agent network?. The structural reason this works: agents accept neighbor information without verification, so error propagates freely even though agents can detect direct conflicts Why do multi-agent systems fail to coordinate at scale?. Replanning on top of unverified inputs just re-launders the poison into new assignments.

The reduce-surface case is real but conditional. Replanning is also the mechanism by which an agent pauses to reconsider a strategy — DeepAgent's memory folding shows agents can consolidate history and re-plan deliberately rather than drift Can agents compress their own memory without losing critical details?. In principle that same loop could route work *away* from a compromised node. The catch is that it only helps if the replanner consults something trustworthy when it decides. That points at governance living inside the runtime memory the agent actually reads during operation — which proved more effective than external policy precisely because the agent consulted it at decision time Can governance rules embedded in runtime memory actually protect autonomous agents?. Replanning reduces surface only when each replan is gated by verification and resident policy; replanning over blind trust expands it.

There's a quieter cost worth knowing: replanning isn't free safety even when it isn't attacked. Multi-agent groups tend to fail by *liveness loss* — timeouts and stalled convergence — rather than value corruption, and this worsens with group size even with no Byzantine agent present Can LLM agent groups reliably reach consensus together?. So aggressive replanning to 'route around' a threat can stall the system into never agreeing on a plan at all. The honest answer: replanning shifts risk from execution-time (where most defenses inspect) to plan-time (where they mostly don't), and whether that's a net gain depends entirely on whether you've moved your verification and governance to where the decisions are now being made.


Sources 8 notes

Can prompt injection reshape multi-agent workflow without touching infrastructure?

FLOWSTEER demonstrates that a single crafted prompt can bias task assignment, roles, and routing during workflow formation, raising malicious success by up to 55 percent and transferring across black-box multi-agent setups. This attack surface precedes the artifacts that existing defenses inspect.

How does workflow position shape attack propagation in multi-agent systems?

FLOWSTEER demonstrates that malicious signals propagate farther when injected into high-influence subtasks, and that framing them as evidence rather than instruction causes downstream agents to relay them. Influence concentrates where dependencies converge, making position-aware attacks far more effective.

Are reasoning models actually more vulnerable to manipulation?

GaslightingBench-R shows that multi-turn manipulative prompts reduce reasoning model accuracy significantly more than standard models. Extended chains create more corruption points, allowing single wrong steps to propagate into confident incorrect conclusions.

Can one compromised agent corrupt an entire multi-agent network?

Research demonstrates that a single biased agent can transmit persistent behavioral corruption through six downstream agents in chain and bidirectional topologies using only normal inter-agent communication. The bias evades detection and paraphrasing defenses because it carries no explicit semantic content.

Why do multi-agent systems fail to coordinate at scale?

AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.

Can agents compress their own memory without losing critical details?

DeepAgent's autonomous memory folding consolidates interaction history into episodic, working, and tool memory schemas. This reduces token overhead while letting agents pause to reconsider strategies—the autonomy and structure together avoid degradation that plagues poorly designed consolidation.

Can governance rules embedded in runtime memory actually protect autonomous agents?

A persistent agent recorded 889 governance events across 96 active days, with safeguards encoded directly into the memory layer the agent consulted during operation. Runtime-resident governance proved more effective than external policies because the agent actually accessed it during decision-making.

Can LLM agent groups reliably reach consensus together?

Across hundreds of simulations, LLM-agent groups frequently fail to reach valid agreement due to timeouts and stalled convergence rather than subtle value corruption. Agreement degrades with group size even without Byzantine agents present.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a security researcher re-evaluating multi-agent replanning as an attack surface. The question remains open: does replanning in coordinated LLM systems reduce or enlarge exploitable gaps?

What a curated library found — and when (dated claims, not current truth):
Findings span 2025–2026 and include:
• Planning itself is an attack surface: FLOWSTEER (2026-05) shows a single crafted prompt biases task assignment and role routing during workflow formation, raising malicious success by ~55% in black-box setups.
• Replanning re-opens the door repeatedly; attackers can shape position to concentrate influence in high-dependency subtasks, allowing payloads to frame as evidence rather than instruction (2026-05).
• Extended reasoning chains create corruption points where a single wrong step propagates into confident wrong conclusions (2025-06).
• Subliminal prompt injection propagates behavioral bias through six downstream agents using normal messages, evading paraphrasing and detection (2026-02).
• Replanning could reduce surface only if each replan is gated by verification and resident governance consulted at decision time; otherwise it re-launders poison into new assignments (2026-05, 2026-04).

Anchor papers (verify; mind their dates):
• arXiv:2605.11514 — FLOWSTEER (2026-05)
• arXiv:2603.00131 — Thought Virus (2026-02)
• arXiv:2506.09677 — Reasoning gaslighting (2025-06)
• arXiv:2510.21618 — DeepAgent (2025-10)

Your task:
(1) RE-TEST THE TRADE-OFF. For the planning-time vulnerability (FLOWSTEER's ~55% boost), the subliminal propagation (six-hop persistence), and the liveness-loss failure mode: judge whether newer model scaling, in-context governance, or verification harnesses since 2026-05 have *relaxed* the planning-time exposure or *shifted* where attacks now concentrate. Separate the durable vulnerability (replanning *is* a decision point) from perishable constraints (detection, isolation).
(2) Surface the strongest *disagreement*: the library claims replanning reduces surface only if gated by resident policy, yet also reports liveness loss worsens with group size *even without Byzantine agents*. Does this mean aggressive replanning for safety *creates* the conditions it aims to prevent? Find work from late 2026 onward that contradicts or refines this.
(3) Propose 2 research questions that assume the attack landscape may have shifted: (a) Can memory-folded agents (DeepAgent-style) detect and isolate compromised history *before* the next replan, and at what latency cost? (b) Do federated or local-authority replanning schemes (vs. centralized) reduce the concentration of influence that FLOWSTEER exploits?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines