What makes dense retrievers vulnerable to partition-based poisoning exploitation?

This explores why dense (embedding-based) retrievers are structurally easy to poison — and why 'partition-based' approaches keep coming up both as the attack surface and the defense — rather than asking about any one specific exploit.

This explores why dense retrievers are structurally easy to poison, and what 'partition' has to do with it. The short version from the corpus: a dense retriever ranks documents by their geometric closeness to a query in a single shared embedding space, and nothing bounds how much influence any one document can have over that space. A poisoned passage crafted to sit near many queries at once can therefore dominate retrieval for all of them — its reach isn't partitioned, so it leaks everywhere. That's exactly the lever the defense in Can we defend RAG systems from corpus poisoning without retraining? pulls on: RAGPart deliberately partitions retriever learning so a single poisoned document's influence is bounded to a slice rather than the whole corpus, and RAGMask flags documents whose similarity score collapses abnormally under token masking — a tell that the text was optimized to be retrieved rather than to be relevant.

The deeper reason the attack works lives in the geometry. Where do retrieval systems fail and why? makes the load-bearing point: embeddings measure *association*, not *relevance* — so a document doesn't have to be a good answer, it just has to be geometrically near. An attacker who can optimize text against that similarity function is playing the retriever's own game. And Why can't cosine space retrievers distinguish word order? shows the space is even friendlier to abuse than it looks: cosine spaces force concepts into linear superposition, which means a crafted passage can be near many distinct query directions simultaneously without the geometry pushing back. The retriever literally cannot tell a precise topical match from an adversarial near-miss using compressed vectors alone.

There's a trap here too, and it's worth knowing: you can't just train the vulnerability away. Does training for compositional sensitivity hurt dense retrieval? finds that pushing dense retrievers to be more structurally discriminating (the same sensitivity that would help reject crafted poison) consistently *degrades* zero-shot generalization by 8–40% nDCG. That's why poisoning is a retrieval-layer problem, not a tuning problem — the fix has to sit outside the embedding bottleneck.

Which is why the most durable answers in the corpus add a second stage rather than a better first stage. Can verification separate structural near-misses from topical matches? puts a small verifier on the full token-to-token similarity map *after* cosine recall, and it reliably rejects structural near-misses that compressed-vector matching waves through — the same class of object a poisoned document is. Pair that with RAGPart's partitioning and you get the shape of a real defense: bound any single document's blast radius, then verify survivors on signals the embedding space throws away.

The last thing worth knowing you didn't ask: retrieval-time poisoning isn't even the worst case. How much poisoned training data survives safety alignment? shows that at just 0.1% contamination, denial-of-service, context-extraction, and belief-manipulation attacks survive standard safety alignment entirely. So a partitioned, verified retriever is defending one layer of a stack where poison can also be baked in far earlier — and the retrieval layer is, encouragingly, the one place you can detect it without retraining anything.

Sources 6 notes

Can we defend RAG systems from corpus poisoning without retraining?

RAGPart and RAGMask provide lightweight, retraining-free defenses that operate at the retrieval layer. RAGPart bounds poisoned-document influence via partitioned retriever learning; RAGMask flags suspicious documents through abnormal similarity collapse under token masking.

Where do retrieval systems fail and why?

RAG systems fail at three structural levels: adaptive triggering (fixed intervals waste context), semantic-task mismatch (embeddings measure association, not relevance), and mathematical limits (embedding dimension constrains representable document sets). These require fundamentally different retrieval approaches, not tuning.

Why can't cosine space retrievers distinguish word order?

Unit-sphere cosine spaces force concepts into linear superposition, a commutative structure that cannot robustly represent non-commutative distinctions like "dog bit man" versus "man bit dog." This geometric constraint persists regardless of training procedure and requires architectural alternatives like token-level interaction or downstream verification.

Does training for compositional sensitivity hurt dense retrieval?

Adding structure-targeted negatives to dense retrieval training consistently degrades zero-shot performance (8-40% nDCG@10 drop) while only partially improving compositional discrimination. This is a geometric trade-off in high-dimensional cosine spaces, not a tuning problem.

Can verification separate structural near-misses from topical matches?

A two-stage pipeline—pooled-cosine recall followed by a small Transformer verifier operating on token-token similarity maps—reliably rejects structural near-misses that MaxSim-style late interaction cannot. The verifier succeeds because it operates on full token interaction patterns rather than compressed vectors.

How much poisoned training data survives safety alignment?

Denial-of-service, context extraction, and belief manipulation attacks persist through standard safety alignment at 0.1% poisoning rates, while jailbreaking attacks are successfully suppressed, contradicting sleeper agent persistence hypotheses.

What makes dense retrievers vulnerable to partition-based poisoning exploitation?

Sources 6 notes

Next inquiring lines