Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System

Paper · arXiv 2308.08169 · Published August 16, 2023
Retrieval-Augmented Generation (RAG)Natural Language Inference

End-to-end task-oriented dialogue (TOD) systems have achieved promising performance by leveraging sophisticated natural language understanding and natural language generation capabilities of pre-trained models. This work enables the TOD systems with more flexibility through a simple cache. The cache provides the flexibility to dynamically update the TOD systems and handle both existing and unseen dialogue scenarios. Towards this end, we first fine-tune a retrieval module to effectively retrieve the most relevant information entries from the cache. We then train end-to-end TOD models that can refer to and ground on both dialogue history and retrieved information during TOD generation. The cache is straightforward to construct, and the backbone models of TOD systems are compatible with existing pre-trained generative models. Extensive experiments demonstrate the superior performance of our framework, with a notable improvement in non-empty joint goal accuracy by 6.7% compared to strong baselines.

Introduction. Task-oriented dialogue (TOD) systems play an important role in various applications, such as restaurant booking, alarm setting, and recommendations (Gao et al., 2018; Xie et al., 2022). These systems can be broadly categorized into two groups: pipeline-based dialogue systems and end-to-end dialogue systems. Pipeline-based dialogue systems consist of four separate modules: a natural language understanding (NLU) module to detect user intents, a dialogue state tracking (DST) module to track user belief states across dialogue turns, a dialogue management (DM) module to decide system actions based on dialogue states, and a natural language generation (NLG) module to generate natural-language responses. However, the pipelinebased approach is annotation-intensive, prone to error propagation, and challenging to scale (Hosseini- Asl et al., 2020; Zhang et al., 2020; Feng et al., 2023).

Discussion / Conclusion. This paper aims to enhance the performance of end-to-end TOD systems by incorporating a simple cache. We begin by constructing a simple cache containing intents and slots. Subsequently, we finetune a retrieval module to extract the most relevant information entries. Next, we train the end-to-end TOD model, enabling it to reference and ground both the dialogue history and the retrieved information during TOD generation. Experimental results, based on a large-scale SGD dataset, demonstrate that our approach outperforms strong baselines.