Forecasting the presence and intensity of hostility on Instagram using linguistic and social features
Online antisocial behavior, such as cyberbullying, harassment, and trolling, is a widespread problem that threatens free discussion and has negative physical and mental health consequences for victims and communities. While prior work has proposed automated methods to identify hostile comments in online discussions, these methods work retrospectively on comments that have already been posted, making it difficult to intervene before an interaction escalates. In this paper we instead consider the problem of forecasting future hostilities in online discussions, which we decompose into two tasks: (1) given an initial sequence of non-hostile comments in a discussion, predict whether some future comment will contain hostility; and (2) given the first hostile comment in a discussion, predict whether this will lead to an escalation of hostility in subsequent comments. Thus, we aim to forecast both the presence and intensity of hostile comments based on linguistic and social features from earlier comments. To evaluate our approach, we introduce a corpus of over 30K annotated Instagram comments from over 1,100 posts.
Introduction. Harassment, and the aggressive and antisocial content it entails, is a persistent problem for social media — 40% of users have experienced it (Duggan et al. 2014). Existing approaches for addressing toxic content, including automated mechanisms, crowdsourced moderation, and user controls, have so far been ineffective at reducing hostility. Addressing harassment and other hostile online behaviors is challenging in part because people (Guberman and Hemphill 2017), policies (Pater et al. 2016), and laws (Nocentini et al. 2010) disagree about what constitutes unacceptable behavior, it takes so many forms, occurs in so many contexts, and is started by many different kinds of users (Phillips and Milner 2017; Phillips 2015; Cheng et al. 2017).
Discussion / Conclusion. We proposed methods to forecast both the presence and intensity of hostility in Instagram comments. Using a combination of linguistic and social features, the best model produces an AUC of 0.82 for forecasting the presence of hostility ten or more hours in the future, and an AUC of 0.91 for forecasting whether a post will receive more than 10 hostile comments or only one hostile comment. We find several predictors of future hostility, including (1) the post’s author has received hostile comments in the past; (2) the use of userdirected profanity; (3) the number of distinct users participating in a conversation; and (4) trends in hostility thus far in the conversation. By distinguishing between posts that will receive many hostile comments and those that will receive few or none, the methods proposed here provide new ways to prioritize specific posts for intervention. Since moderation resources are limited, it makes sense to assign them to posts where there is still time to de-escalate a conversation or prevent additional commenting. For instance, Instagram and similar platforms could use our approach to manage their moderation queues.