SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation

Do retrieval models actually follow natural language instructions?

Most IR systems ignore instructions that define relevance, despite using LLM backbones. This raises questions about whether retrievers can adapt to nuanced user-specified information needs in practice.

Synthesis note · 2026-06-03 · sourced from Self Refinement Self Consistency Feedback

LLMs follow long, complex instructions, and IR models increasingly use LLM backbones — yet nearly all retrievers still take only a query, with no instruction defining what relevance means for this task. FollowIR builds a benchmark from the TREC tradition (where human annotators receive narratives — detailed instructions — to judge relevance) by altering those annotator instructions and re-annotating, then measuring whether IR models adjust their relevance decisions accordingly. The finding: nearly all retrieval models do not follow instructions, with exceptions only for very large (3B+) or instruction-tuned LLMs not typically used for retrieval. But it is learnable — a training corpus teaches instruction-following, and FollowIR-7B improves on both standard retrieval metrics and instruction-following.

The keeper is that retrieval is stuck in an ad-hoc keyword paradigm while the rest of NLP moved to flexible instructions: relevance is treated as a fixed property of the query rather than something an instruction can redefine on the fly. Closing that gap would let users specify complex information needs in natural language.

This connects the vault's retrieval thread to instruction-following. It complements Can question features alone predict when to retrieve? (when to retrieve) by addressing what counts as relevant — both are limits of the query-only retrieval paradigm the RAG-gap note diagnoses.

Inquiring lines that use this note as a source 3

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 110 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

retrieval models do not follow natural-language instructions defining relevance — only very large or instruction-tuned ones do