Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering
Effective communication requires the ability to identify ambiguities and request clarification of utterances. For machines to engage in a conversation, they need to learn to generate different forms of clarification questions. This paper aims at studying the extent to which the state of the art neural generation models can generate effective clarification questions in conversational question answering. We introduce Abg-CoQA, a novel crowdsourced dataset for clarifying ambiguities in conversational question answering systems. Our dataset contains 8,615 questions with answers where 994 questions are ambiguous. The conversational questions are about 3,968 text passages from five diverse domains which are pre-selected from the CoQA dataset. For ambiguous turns, we have collected the clarification questions and their answers. We evaluate strong language generation models and conversational question answering models on Abg-CoQA. The best-performing system achieves a F1-score of 23.6% on ambiguity detection; an accuracy of 56.0% on generating clarification question in human evaluation; and a F1 score of 40.1% on question answering after clarification, which is 35.1 points behind human performance (75.2%), indicating there is ample room for improvement.
Introduction. People naturally resolve ambiguities in conversation by asking context-dependent clarification questions [Clark and Brennan, 1991]. Although there has been a surge in research on conversational question answering [Choi et al., 2018, Reddy et al., 2019], few studies have explored ambiguity resolution and clarification. Question answering (QA) systems’ goal is to return the answer to a question by searching for a response in a collection of documents [Rajpurkar et al., 2016, 2018, Seo et al., 2017, Wang et al., 2017]. Often these systems may need additional information in order to answer an user’s query. Similar to human-human conversation, QA systems needs to identify and clarify ambiguities in order to be able to return the correct response. In this paper, we introduce Abg-CoQA, a dataset for clarifying the ambiguity in Conversational Question Answering. The specific problem we address in this paper is on detecting ambiguities and generating the right kinds of clarification questions for resolving them.
Discussion / Conclusion. Our empirical study and computational experiments show that identifying and resolving ambiguities in a conversation is rather a challenging task even for people. We discuss the success and limitations of the state-of-the-art end-to-end neural models and present a dataset that can be used for further investigating solutions for managing ambiguities in conversational question answering. A direct application of pre-trained large models, e.g. BERT, BART, is not able to identify ambiguities or to generate clarification questions for resolving the ambiguity. Future research developing on Abg-CoQA models may include modeling uncertainty as a signal of ambiguity and generating clarification questions in several steps, such as first identifying the ambiguity type, then generating a contextual phrase which targets the ambiguity, finally formulating the complete question.