How vulnerable is GraphRAG to tiny text manipulations?
GraphRAG converts raw text into knowledge graphs for question answering. This explores whether adversaries can degrade accuracy with minimal edits to source documents, and what makes the system susceptible.
GraphRAG relies on LLMs to extract knowledge from raw text during graph construction — and this extraction process can be manipulated with minimal text changes. Two complementary attacks demonstrate the vulnerability:
Targeted KPA (TKPA): Uses graph-theoretic analysis to locate vulnerable nodes in the generated graph, then rewrites corresponding narratives with LLMs. Achieves 93.1% success rate at controlling specific QA outcomes while keeping poisoned text fluent and natural. This is precision — making specific queries return attacker-desired answers.
Universal KPA (UKPA): Exploits linguistic cues (pronouns, dependency relations) to disrupt structural integrity of the generated graph by altering globally influential words. With fewer than 0.05% of text modified, QA accuracy collapses from 95% to 50%. This is breadth — small modifications corrupt reasoning across many queries.
The critical distinction from prior adversarial attacks on RAG: this is a manipulation-only attack surface. The adversary doesn't inject new content — they make subtle edits to existing trusted sources (e.g., minor Wikipedia changes). The corrupted graph structure persists after construction and misleads all subsequent queries built on it. Stealthiness is achieved implicitly by restricting edits to very small modifications on trusted sources.
The structural vulnerability: GraphRAG's strength (converting unstructured text into structured knowledge) becomes its weakness because the LLM extraction step is sensitive to small perturbations that propagate through the graph. Entity and relation extraction errors compound through graph topology — a single misattributed relationship can redirect entire reasoning paths.
This connects to:
- How much poisoned training data survives safety alignment? — knowledge poisoning operates at a different level (corpus text rather than training data) but shares the principle that small contamination has outsized downstream effects
- Can knowledge graphs enable multi-hop reasoning in one retrieval step? — HippoRAG's KG construction faces the same extraction vulnerability; graph-based retrieval amplifies poisoning because errors propagate through relational traversal
- How vulnerable are reasoning models to irrelevant text? — CatAttack operates on model inputs while KPA operates on knowledge sources, but both demonstrate that minimal perturbations have disproportionate effects on reasoning systems
Inquiring lines that use this note as a source 3
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models
- Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities
- You Don't Need Pre-built Graphs for RAG: Retrieval Augmented Generation with Adaptive Reasoning Structures
- Talk like a Graph: Encoding Graphs for Large Language Models
- Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation
- JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering
- StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
- From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Original note title
Knowledge poisoning attacks collapse GraphRAG accuracy from 95 to 50 percent by modifying fewer than 0.05 percent of source text words