Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

Paper · Source
LLM Architecture

End-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors. We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action templates. Compared to existing end-toend approaches, HCNs considerably reduce the amount of training data required, while retaining the key benefit of inferring a latent representation of dialog state. In addition, HCNs can be optimized with supervised learning, reinforcement learning, or a mixture of both. HCNs attain stateof-the-art performance on the bAbI dialog dataset (Bordes and Weston, 2016), and outperform two commercially deployed customer-facing dialog systems.

Introduction. Task-oriented dialog systems help a user to accomplish some goal using natural language, such as making a restaurant reservation, getting technical support, or placing a phonecall. Historically, these dialog systems have been built as a pipeline, with modules for language understanding, state tracking, action selection, and language generation. However, dependencies between modules introduce considerable complexity – for example, it is often unclear how to define the dialog state and what history to maintain, yet action selection relies exclusively on the state for input. Moreover, training each module requires specialized labels. Recently, end-to-end approaches have trained recurrent neural networks (RNNs) directly on text transcripts of dialogs. A key benefit is that the RNN infers a latent representation of state, obviating the need for state labels. However, end-to-end methods lack a general mechanism for injecting domain knowledge and constraints.

Discussion / Conclusion. This paper has introduced Hybrid Code Networks for end-to-end learning of task-oriented dialog systems. HCNs support a separation of concerns where procedural knowledge and constraints can be expressed in software, and the control flow is learned. Compared to existing end-to-end approaches, HCNs afford more developer control and require less training data, at the expense of a small amount of developer effort. Results in this paper have explored three different dialog domains. On a public benchmark in the restaurants domain, HCNs exceeded performance of purely learned models. Results in two troubleshooting domains exceeded performance of a commercially deployed rule-based system. Finally, in a name-dialing domain, results from dialog simulation show that HCNs can also be optimized with a mixture of reinforcement and supervised learning. In future work, we plan to extend HCNs by incorporating lines of existing work, such as integrating the entity extraction step into the neural network (Dhingra et al., 2017), adding richer utterance embeddings (Socher et al., 2013), and supporting text generation (Sordoni et al., 2015).