TOPIC

Retrieval-Augmented Generation (RAG)

24 synthesis notes · 62 source papers
View as

When should retrieval happen during model generation?

Explores whether retrieval should occur continuously, at fixed intervals, or only when the model signals uncertainty. Standard RAG retrieves once; long-form generation requires dynamic triggering based on confidence signals.

Explore related Read →

Can question features alone predict when to retrieve?

Can lightweight external features of a question—rather than expensive model uncertainty checks—reliably decide whether retrieval is needed? This matters because uncertainty-based methods promise efficiency but add computation.

Explore related Read →

Can retrieval be extended into multi-step chains like reasoning?

Standard RAG retrieves once, but multi-hop tasks need intermediate steps. Can we train models to plan retrieval sequences the way chain-of-thought trains reasoning, and scale retrieval at test time?

Explore related Read →

Can you adapt retrieval models without accessing target data?

Explores whether dense retrieval systems can adapt to new domains using only a textual description, rather than actual target documents—especially relevant for privacy-restricted or competitive scenarios.

Explore related Read →

What do enterprise RAG systems need beyond accuracy?

Academic RAG benchmarks focus on question-answering accuracy, but enterprise deployments in regulated industries face five distinct requirements—compliance, security, scalability, integration, and domain expertise—that standard architectures don't address.

Explore related Read →

Can fine-tuning replace query augmentation for retrieval?

Query augmentation helps retrievers handle ambiguous queries but increases input cost. Does fine-tuning the retrieval model achieve comparable performance without this overhead?

Explore related Read →

Can long-context models resolve retriever-reader imbalance?

Traditional RAG systems force retrievers to find precise passages because readers had small context windows. Do modern long-context LLMs change what architecture makes sense?

Explore related Read →

Can query-time graph construction replace pre-built knowledge graphs?

Does building dependency graphs from individual queries at inference time offer a more flexible and cost-effective alternative to constructing knowledge graphs over entire document collections upfront?

Explore related Read →

Can retrieval learn what actually helps answer questions?

Standard RAG trains retrievers to find similar documents and generators to produce answers separately. But does surface similarity match what genuinely helps generate correct responses? This explores whether retrieval can receive feedback from answer quality.

Explore related Read →

Can knowledge graphs enable multi-hop reasoning in one retrieval step?

Standard RAG retrieves once but misses chains; iterative RAG follows chains but costs more. Can we encode multi-hop paths in a knowledge graph so one retrieval pass discovers them all?

Explore related Read →

Can long-context LLMs replace retrieval-augmented generation systems?

Explores whether loading entire corpora into LLM context windows can eliminate the need for separate retrieval systems, and what task types this approach handles well or poorly.

Explore related Read →

Can a model's partial response guide what to retrieve next?

Does using the model's in-progress output as a retrieval signal reveal information needs better than the original query alone? This explores whether generation itself can diagnose what documents are missing.

Explore related Read →

Does question type determine the right retrieval strategy?

Explores whether different non-factoid question types require distinct retrieval and decomposition approaches. Matters because standard RAG fails when applied uniformly to debate, comparison, and experience questions despite being effective for factoid queries.

Explore related Read →

Does supervising retrieval steps outperform final answer rewards?

Can intermediate feedback on retrieval decisions—which documents to fetch, when to stop—train agentic RAG systems more effectively than rewarding only the final answer? This matters because poor retrieval paths can accidentally succeed or good ones can fail on noisy metrics.

Explore related Read →

Why do queries and documents occupy different embedding spaces?

Queries and documents express the same information in fundamentally different ways—short and interrogative versus long and declarative. Understanding this mismatch is crucial for why direct embedding retrieval often fails.

Explore related Read →

Can rationale-driven selection beat similarity re-ranking for evidence?

Can LLMs generate search guidance that outperforms traditional similarity-based evidence ranking? This matters because current re-ranking lacks interpretability and fails against adversarial attacks.

Explore related Read →

Can recurrent memory scale where attention fails on ultra-long text?

GPT-4 and RAG plateau around 10,000 tokens and rely heavily on the first quarter of input. Can recurrent memory augmentation overcome these limits and enable reasoning across millions of tokens?

Explore related Read →

When should language models retrieve external knowledge versus use internal knowledge?

Can we model retrieval as a per-step decision problem rather than an always-on strategy? This matters because unnecessary retrieval adds noise and latency without improving accuracy.

Explore related Read →

Can reinforcement learning embed domain knowledge more effectively than supervised fine-tuning?

Explores whether rewarding coherent reasoning patterns during training helps models internalize domain knowledge better than standard fine-tuning approaches that treat all tokens equally.

Explore related Read →

Can document count be learned instead of fixed in RAG?

Standard RAG systems use a fixed number of documents regardless of query complexity. Can an RL agent learn to dynamically select both how many documents and their order based on what helps the generator produce correct answers?

Explore related Read →

Can retrieval systems ground answers in the right time?

Explores whether document retrieval for language models can distinguish between multiple versions of the same content from different time points, and whether adding temporal awareness to retrieval scoring helps answer time-sensitive questions accurately.

Explore related Read →

Why does retrieval-augmented generation fail in production?

RAG systems work in controlled demos but break in real-world deployment, especially for high-stakes domains like medicine and finance. Understanding the three structural failure modes reveals why.

Explore related Read →

Can simple uncertainty estimates beat complex adaptive retrieval?

Does measuring a language model's own confidence on token probabilities outperform expensive multi-call adaptive retrieval pipelines? This matters because it could simplify RAG systems while reducing computational overhead.

Explore related Read →

Do vector embeddings actually measure task relevance?

Vector embeddings rank semantic similarity, but RAG systems need topical relevance. When these diverge—as with king/queen versus king/ruler—does similarity-based retrieval fail in production?

Explore related Read →

Source papers 62

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.