Dialog Inpainting: Turning Documents into Dialogs

Many important questions (e.g. “How to eat healthier?”) require conversation to establish context and explore in depth. However, conversational question answering (ConvQA) systems have long been stymied by scarce training data that is expensive to collect. To address this problem, we propose a new technique for synthetically generating diverse and high-quality dialog data: dialog inpainting. Our approach takes the text of any document and transforms it into a twoperson dialog between the writer and an imagined reader: we treat sentences from the article as utterances spoken by the writer, and then use a dialog inpainter to predict what the imagined reader asked or said in between each of the writer’s utterances. By applying this approach to passages from Wikipedia and the web, we produce WikiDialog and WebDialog, two datasets totalling 19 million diverse information-seeking dialogs—1,000x larger than the largest existing ConvQA dataset. Furthermore, human raters judge the answer adequacy and conversationality of WikiDialog to be as good or better than existing manually-collected datasets.
Introduction. Modern information-seeking tools such as web search and question answering (Karpukhin et al., 2020; Zhu et al., 2021) excel at questions that have well-defined answers (e.g., “Where was Barack Obama born?”). But many important questions are more open-ended—e.g., “How to eat healthier?”—and require conversation to elicit context and explore in depth: “How do I eat more protein?”, “What about vegetarians?”. Conversational question answering systems (ConvQA) (Stede & Schlangen, 2004; Radlinski & Craswell, 2017; Culpepper et al., 2018), would empower users to answer these questions as if they could discuss with an expert at any time. Despite this promising vision, progress has been stymied by scarce training data. While conversational data is abundant in online forums, much of it focuses on personal anecdotes and subjective opinions, and is thus unsuitable for an information-seeking system: we desire responses that minimize personal biases and cite reliable sources.
Discussion / Conclusion. In this paper, we have presented dialog inpainting, a novel approach to generating synthetic conversational data. We showed that it is possible to generate compelling information-seeking dialogs using only general-purpose data, suggesting applications to other conversational tasks. While synthetic data cannot entirely replace real data, it can help bootstrap interactive conversation systems and create a virtuous cycle wherein users find it natural to engage with and improve the system. We are particularly optimistic about applying the dialog inpainting data to (1) distillation, where the inpainted datasets serve as large-scale distillation sets, (2) end-to-end conversational question answering, and (3) zero-shot conversational QA, which is motivated by the zero-shot retrieval capabilities shown in this work. It is important to be aware of the biases that generating data can introduce or amplify. We want to encourage good inductive biases that make conversations conversational— e.g., use of anaphora or elision of context—and to introduce further control over the dialogs generated—e.g., persona or dialog acts.