How severely do minimal corpus modifications damage RAG accuracy in practice?

This reads the question as being about corpus poisoning — whether an attacker (or just bad data) altering a tiny slice of the retrieval corpus can knock RAG answers off course, and how reversible that damage is.

This explores corpus poisoning: how much accuracy you lose when a small number of documents in a RAG system's knowledge base get altered or maliciously injected. The short version the corpus points to is that the damage can be disproportionate — a few poisoned documents punch far above their weight — but the fragility is structural and, encouragingly, the defenses turn out to be lightweight.

The reason a handful of bad documents matters so much is baked into how retrieval works. RAG doesn't read the whole corpus; it pulls the top few matches and hands only those to the model. So a poisoned document that scores high on similarity for a target query gets injected straight into the answer, regardless of how clean the other million documents are. Two notes argue this isn't an incidental bug but a property of the architecture: production RAG fails along structural axes where embeddings measure association rather than true relevance Why does retrieval-augmented generation fail in production?, and retrieval breaks at the level of semantic-task mismatch rather than at the margins you could tune away Where do retrieval systems fail and why?. If embeddings can be gamed into ranking a malicious chunk highly, minimal modification is exactly the efficient attack.

The more interesting half of the answer is that you don't need to retrain anything to blunt it. RAGPart bounds how much any single poisoned document can influence the answer by partitioning the retriever, while RAGMask flags suspicious documents by watching for abnormal similarity collapse when tokens are masked — both operate at retrieval time, before generation Can we defend RAG systems from corpus poisoning without retraining?. So the severity is high in an undefended pipeline but sharply reducible with detection that costs little.

Laterally, the corpus suggests a second line of defense that has nothing to do with catching the poison and everything to do with what the model does once it's retrieved. A multilingual RAG system built for noisy, OCR-mangled historical newspapers survives corruption not by cleaning the corpus but by refusing to answer when the evidence isn't solid — trading coverage for integrity through a grounded-refusal prompt Can RAG systems refuse to answer without reliable evidence?. The same instinct shows up in systems that let RAG learn from its own outputs: write-back is gated behind entailment checks, attribution, and novelty detection precisely so that one bad generation can't pollute future retrievals Can RAG systems safely learn from their own generated answers?. Both treat the corpus as untrustworthy by default and put the burden of proof on the evidence.

The thing worth walking away with: the severity of minimal poisoning is a measure of how much blind trust your pipeline places in its top retrieved chunks. The papers that take poisoning seriously and the papers that take OCR noise seriously converge on the same fix — make the system demand grounding rather than assume it — which means corpus robustness is less about scrubbing the data and more about designing retrieval and generation to expect that some of it is wrong.

Sources 5 notes

Can we defend RAG systems from corpus poisoning without retraining?

RAGPart and RAGMask provide lightweight, retraining-free defenses that operate at the retrieval layer. RAGPart bounds poisoned-document influence via partitioned retriever learning; RAGMask flags suspicious documents through abnormal similarity collapse under token masking.

Why does retrieval-augmented generation fail in production?

RAG systems fail in production due to embedding inadequacy (measuring association not relevance), missing enterprise requirements (attribution, security, compliance), and single-pass architecture limitations. Known solutions exist but aren't implemented in demo systems.

Where do retrieval systems fail and why?

RAG systems fail at three structural levels: adaptive triggering (fixed intervals waste context), semantic-task mismatch (embeddings measure association, not relevance), and mathematical limits (embedding dimension constrains representable document sets). These require fundamentally different retrieval approaches, not tuning.

Can RAG systems refuse to answer without reliable evidence?

A multilingual RAG system for noisy historical newspapers succeeds by aggressively expanding retrieval while constraining generation to only grounded answers. The grounded-refusal prompt prevents hallucination when OCR errors and language drift degrade source quality, trading coverage for integrity.

Can RAG systems safely learn from their own generated answers?

Systems can add generated answers to their retrieval corpus when outputs pass entailment verification, source attribution checks, and novelty detection. This prevents hallucinations from polluting future retrievals while allowing genuine knowledge accumulation.

How severely do minimal corpus modifications damage RAG accuracy in practice?

Sources 5 notes

Next inquiring lines