Rhetoric, Logic, and Dialectic: Advancing Theory-based Argument Quality Assessment in Natural Language Processing
Though preceding work in computational argument quality (AQ) mostly focuses on assessing overall AQ, researchers agree that writers would benefit from feedback targeting individual dimensions of argumentation theory. However, a large-scale theory-based corpus and corresponding computational models are missing. We fill this gap by conducting an extensive analysis covering three diverse domains of online argumentative writing and presenting GAQCorpus: the first largescale English multi-domain (community Q&A forums, debate forums, review forums) corpus annotated with theory-based AQ scores. We then propose the first computational approaches to theory-based assessment, which can serve as strong baselines for future work. We demonstrate the feasibility of large-scale AQ annotation, show that exploiting relations between dimensions yields performance improvements, and explore the synergies between theory-based prediction and practical AQ assessment.
Introduction. Providing relevant and sufficient justifications for a claim and using clear language to express reasoning are important features of everyday writing. These are components of Argument Quality (AQ), which has been studied in many domains, such as student essays (Wachsmuth et al., 2016), news editorials (El Baff et al., 2018), and debate forums (Lukin et al., 2017). Preceding work in natural language processing (NLP) and computational linguistics (CL) has mostly focused on practical AQ assessment1, considering either the overall quality of arguments (Toledo et al., 2019; Gretz et al., 2020, inter alia) or a single specific conceptualization of AQ, e.g., argument strength (Persing and Ng, 2015), convincingness (Habernal and Gurevych, 2016), and relevance (Wachsmuth et al., 2017c). However, Gretz et al. (2020) note the need to predict quality in terms of fine-grained aspects. Fine-grained prediction enables a deeper understanding of argumentation and offers specific feedback to authors aiming to improve their argumentative writing skills.
Discussion / Conclusion. BERT MTflat. This is especially the case when correlating the predictions with our annotations for the effectiveness dimensions. To sum up, our experiment (E1)–(E3) yield the following findings: (1) Largescale predictions, obtained from a theory-based AQ model on a large (practical) AQ data set, correlate mostly with the Effectiveness dimension. (2) The transferred knowledge obtained in the STILT-setup on IBM-Rank-30k in BERT IBM MTflat improves the performance on GAQCorpus for Effectiveness the most. These two facts match Gretz et al. (2020)’s hypothesis that their annotations mostly captured Effectiveness. We empirically substantiate the idea (without any manual effort) that a theory-based approach can inform practical AQ research and increase interpretability of practically-driven research outcomes and, on the other hand, the practical approach can increase the efficacy of theory-based AQ models when targeting a certain domain and dimension. Specific assessment of the rhetorical, logical, and dialectical perspectives on argumentative texts can inform researchers, e.g., about phenomena captured with annotations, and help people improve their writing skills.