Identification of Propositional and Illocutionary Relations
In this paper we tackle the shared task DialAM- 2024 aiming to annotate dialogue based on the inference anchoring theory (IAT). The task can be split into two parts, identification of propositional relations and identification of illocutionary relations. We propose a pipelined system made up of three parts: (1) locutionary– propositions relation detection, (2) propositional relations detection, and (3) illocutionary relations identification. We fine-tune models independently for each step, and combine at the end for the final system. Our proposed system ranks second overall compared to other participants in the shared task, scoring an average f1-score on both sub-parts of 63.7.
Introduction. This paper is a system design paper for the DialAM- 2024 task. This task involves the creation of dialogue annotations from dialogue text. Specifically, annotations in the format of a graph under the Inference Anchoring Theory (IAT) Framework. The IAT (Ruiz-Dolz et al., 2024) framework allows for dialogue argumentation annotations in a way that retains relevant information and structural data irrespective of domain. For this task, we are provided with a dataset that contains numerous .json files where each document represents a graph under the IAT framework. The data used is the QT30 corpus (Hautli-Janisz et al., 2022), where dialogue is taken from 30 episodes of the show Question Time. Our system is a pipeline that splits the tasks into three steps. At the first step we utilize BERTScore to produce similarity scores to find connections. Then, for each step, we fine-tune a BERT model to perform multiclass classification using information gained from previous steps as input. We finetune each model separately and combine it into a pipeline at the end where we create a finished graph.
Discussion / Conclusion. From Table 1 we can see that the main part that performs well is Subtask B (ILO). Both focused and general cases perform quite well indicating a good balance of predictions in every class. Overall our system performed quite well, especially on Subtask B which was the identification of illocutionary relations. The recurring technique that we used was fine-tuning a BERT model which proved to be quite effective. A strong point of our system is its ability to get quite similar scores among both General and Focused cases. This is likely due to our upsampling which helped with the largely imbalanced dataset. The one downside in our system seems to be the issue of cascading errors. This is reflected in the scores as we do part of Subtask B first before moving on to Subtask A and the ILO scores are much higher than our ARI scores. Moving forward we will need some way to eliminate the impact of these errors.