Can community detection enable RAG systems to answer global corpus questions?

Standard RAG struggles with corpus-wide questions that require understanding overall themes rather than retrieving specific passages. Can graph community detection overcome this limitation at scale?

Synthesis note · 2026-02-23 · sourced from Knowledge Graphs

Standard RAG fails on global questions directed at entire text corpora ("What are the main themes in the dataset?") because these are query-focused summarization (QFS) tasks, not explicit retrieval tasks. Prior QFS methods fail to scale to the quantities of text indexed by typical RAG systems. Graph RAG bridges both limitations.

The two-stage approach:

Graph construction: LLM extracts named entities and relationships from source documents, building an entity knowledge graph with weighted edges (normalized counts of detected relationship instances). A secondary extraction captures claims linked to detected entities (subject, object, type, description, source span, dates).
Community-based summarization: Leiden algorithm partitions the graph into hierarchical communities of closely-related entities. LLM generates report-like summaries for each community at each hierarchy level. These summaries are pre-generated and independently useful for understanding global dataset structure.

Given a question, each community summary generates a partial response, then all partial responses are summarized into a final global answer (map-reduce pattern). This exploits a previously unexplored quality of graphs: their inherent modularity and the ability of community detection algorithms to partition them into coherent groups.

The community summaries serve dual purposes: (1) answering questions via map-reduce, and (2) enabling sensemaking in the absence of a specific question — users can scan community summaries at one hierarchy level for themes, then follow links to lower-level reports for subtopic details.

This represents a fundamentally different use of graphs in RAG: not for structured retrieval and traversal (as in HippoRAG or LogicRAG), but for modular summarization that provides complete coverage of the underlying corpus.

This connects to:

Can knowledge graphs enable multi-hop reasoning in one retrieval step? — HippoRAG uses KG for traversal-based retrieval; GraphRAG uses KG for community-based summarization; complementary approaches to the same infrastructure
Can query-time graph construction replace pre-built knowledge graphs? — LogicRAG avoids pre-built graphs; GraphRAG embraces them for global coverage; the trade-off is query-adaptivity vs. corpus-completeness
What do enterprise RAG systems need beyond accuracy? — GraphRAG's community summaries directly address the scalability and customization requirements by enabling hierarchical exploration
Do hierarchical retrieval architectures outperform flat ones on complex queries? — GraphRAG's map-reduce over community summaries is a specific realization of separated planning (community selection) and synthesis (summary aggregation)

Inquiring lines that use this note as a source 16

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

GraphRAG uses community detection to enable global query-focused summarization that neither pure RAG nor pure summarization can achieve at scale

Can community detection enable RAG systems to answer global corpus questions?

Related papers in this collection 8

Search by related questions 4