RARR: Researching and Revising What Language Models Say, Using Language Models

Paper · arXiv 2210.08726 · Published October 17, 2022

Language models (LMs) now excel at many tasks such as question answering, reasoning, and dialog. However, they sometimes generate unsupported or misleading content. A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence. To enable attribution while still preserving all the powerful advantages of recent generation models, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically finds attribution for the output of any text generation model, and 2) post-edits the output to fix unsupported content while preserving the original output as much as possible. When applied to the output of several state-of-the-art LMs on a diverse set of generation tasks, we find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models. Furthermore, the implementation of RARR requires only a handful of training examples, a large language model, and standard web search.1

Introduction. Generative language models (LMs) and other text generation models are now the backbone of many AI systems. For example, large language models can perform multi-step reasoning (Nye et al., 2021; Wei et al., 2022), generate plans (Ahn et al., 2022), use tools and APIs (Shin et al., 2021; Thoppilan et al., 2022), and answer open-domain questions (Petroni et al., 2019; Roberts et al., 2020). Despite these incredible advances, state-of-theart LMs still frequently produce biased, misleading, or unsupported content, colloquially called “hallucinations” (Maynez et al., 2020; Menick et al., 2022). To make LMs more trustworthy, we want to justify each generation by an attribution report (Rashkin et al., 2021; Bohnet et al., 2022) that contains supporting evidence from trusted sources (e.g., encyclopedia or articles) where appropriate. Most existing LMs, such as those based on sequence-to-sequence architectures, lack a builtin mechanism for attribution.

Discussion / Conclusion. Language models have developed increasingly good “procedural” knowledge of what should be discussed and how it should be presented, but often struggle to memorize “factoid” knowledge and produce unsubstantiated claims. We proposed RARR, a framework for revising such claims to make them attributable to the researched evidence. From experiments on text passages generated by different models on various domains, we showed that RARR can revise the passages to improve attribution while preserving other desirable properties such as writing style or structure. Furthermore, RARR sits on top of existing generation models without needing to re-design or re-train LMs. Major headroom still remains, as discussed in Section 6 and the Limitations section. We hope our analysis of RARR would help with developing new approaches for integrating attribution to LMs.

RARR: Researching and Revising What Language Models Say, Using Language Models

Synthesis notes that discuss concepts related to this paper