Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
Abstract This paper describes the STAC resource, a corpus of multi-party chats annotated for discourse structure in the style of SDRT (Asher and Lascarides, 2003; Lascarides and Asher, 2009). The main goal of the STAC project is to study the discourse structure of multi-party dialogues in order to understand the linguistic strategies adopted by interlocutors to achieve their conversational goals, especially when these goals are opposed. The STAC corpus is not only a rich source of data on strategic conversation, but also the first corpus that we are aware of that provides full discourse structures for multi-party dialogues. It has other remarkable features that make it an interesting resource for other topics: interleaved threads, creative language, and interactions between linguistic and extra-linguistic contexts.
Introduction. 1. The corpus Our corpus comes from an online version of the game The Settlers of Catan—a win-lose, multi-player game in which players acquire and trade resources (ore, wood, wheat, clay, or sheep) in order to build roads, settlements, and cities and in turn score victory points. Resources are sometimes allocated automatically after a turn, but generally a player will have access to only a strict subset of the resources she needs, prompting her to trade with other players to achieve her goals. We collected our corpus by modifying an online, open-source version of Catan with a chat interface in which players could carry out trade negotiations.2 Figure 1 is a snapshot of the board game, showing the chat window (“Chat”), chat history (“History”), and game history (“Game”), where the game history details all of the extra-linguistic events (e.g., dice rolls, card plays) from the game. This snapshot shows the perspective of the game administrator; normally, the type of resources that a player has is revealed only to that player.
Discussion / Conclusion. Our future work will proceed along two lines. First on the corpus side, we are continuing to extend the data set and to revise the annotations. As noted in §1, our data contains rich information about the evolution of the game state itself—so much information that we can replay an entire game from our data files. This provides us an opportunity to study the interaction of the extra-linguistic context, the events that change the game state, with the linguistic context. Preliminary studies in (Hunter et al., 2015) show that the nonlinguistic context can affect discourse structure in several ways, but future work will involve a large annotation effort to determine the discourse relations involved in these dependencies and their effects on discourse structure. On the experimentation side, we plan to attack the problem of hierarchical structure and attempt to compute CDUs automatically. We believe that our corpus using the CDU Distribute transformation provides us the right kind of data to predict CDUs.