SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Training, RL, and Test-Time Scaling

Can document count be learned instead of fixed in RAG?

Standard RAG systems use a fixed number of documents regardless of query complexity. Can an RL agent learn to dynamically select both how many documents and their order based on what helps the generator produce correct answers?

Synthesis note · 2026-02-22 · sourced from RAG
RAG How should researchers navigate LLM reasoning research?

Every standard RAG re-ranking system passes a fixed k documents to the generator. The k is set by the system designer and held constant across queries. This is wrong in both directions: too few documents omit critical information for complex queries; too many documents introduce noise that misleads the generator and reduces efficiency.

The k selection problem is unsolved by all pre-DynamicRAG re-ranking approaches. Re-rankers have improved document ordering but assumed k was given. The number of documents to retrieve is treated as a hyperparameter, not a learned decision.

DynamicRAG models the reranker as an RL agent whose action space is a permutation and count selection over retrieved documents. The reward is LLM output quality — specifically, whether the generator produces a correct answer given the selected document set. The agent receives both explicit query signals and the generator's feedback.

Training proceeds in two phases. First, behavior cloning on expert trajectories (SFT) gives the reranker a baseline policy and reduces action space complexity. Second, RL with generator feedback allows the reranker to explore and learn to calibrate both ordering and count to query needs.

The insight generalizes beyond re-ranking: any RAG system parameter that is currently a heuristic (chunk size, retrieval depth, context window allocation) is a candidate for learning via generator feedback. The generator's output quality is a reward signal that can backpropagate through any component of the pipeline that affects what the generator receives.

Inquiring lines that use this note as a source 7

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
16 direct connections · 142 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

rl-trained reranker that adjusts document order and count solves the fixed top-k problem in rag