Related papers: Context Matters: Incorporating Target Awareness in Conversational Abusive Language Detection

Context Matters: Incorporating Target Awareness in Conversational Abusive Language Detection

URL: http://arxiv.org/abs/2508.12828v1
Date: Mon, 18 Aug 2025 11:12:21 GMT
Title: Context Matters: Incorporating Target Awareness in Conversational Abusive Language Detection
Authors: Raneem Alharthi, Rajwa Alharthi, Aiqi Jiang, Arkaitz Zubiaga,
Abstract summary: Abusive language detection has become an increasingly important task as a means to tackle this type of harmful content in social media.<n>In this study, we look at conversational exchanges, where a user replies to an earlier post by another user (the parent tweet)<n>We ask: does leveraging context from the parent tweet help determine if a reply post is abusive or not, and what are the features that contribute the most?
Score: 7.323895449517353
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Abusive language detection has become an increasingly important task as a means to tackle this type of harmful content in social media. There has been a substantial body of research developing models for determining if a social media post is abusive or not; however, this research has primarily focused on exploiting social media posts individually, overlooking additional context that can be derived from surrounding posts. In this study, we look at conversational exchanges, where a user replies to an earlier post by another user (the parent tweet). We ask: does leveraging context from the parent tweet help determine if a reply post is abusive or not, and what are the features that contribute the most? We study a range of content-based and account-based features derived from the context, and compare this to the more widely studied approach of only looking at the features from the reply tweet. For a more generalizable study, we test four different classification models on a dataset made of conversational exchanges (parent-reply tweet pairs) with replies labeled as abusive or not. Our experiments show that incorporating contextual features leads to substantial improvements compared to the use of features derived from the reply tweet only, confirming the importance of leveraging context. We observe that, among the features under study, it is especially the content-based features (what is being posted) that contribute to the classification performance rather than account-based features (who is posting it). While using content-based features, it is best to combine a range of different features to ensure improved performance over being more selective and using fewer features. Our study provides insights into the development of contextualized abusive language detection models in realistic settings involving conversations.

Related papers

Will I Get Hate Speech Predicting the Volume of Abusive Replies before Posting in Social Media [0.0]
We look at four types of features, namely text, text metadata, tweet metadata, and account features.<n>This helps us understand the extent to which the user or the content helps predict the number of abusive replies.<n>One of our objectives is to determine the extent to which the volume of abusive replies that a tweet will get are motivated by the content of the tweet or by the identity of the user posting it.
arXiv Detail & Related papers (2025-03-04T21:04:21Z)
VyAnG-Net: A Novel Multi-Modal Sarcasm Recognition Model by Uncovering Visual, Acoustic and Glossary Features [13.922091192207718]
Sarcasm recognition aims to identify hidden sarcastic, criticizing, and metaphorical information embedded in everyday dialogue. We propose a novel approach that combines a lightweight depth attention module with a self-regulated ConvNet to concentrate on the most crucial features of visual data. We have also conducted a cross-dataset analysis to test the adaptability of VyAnG-Net with unseen samples of another dataset MUStARD++.
arXiv Detail & Related papers (2024-08-05T15:36:52Z)
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue [71.15186328127409]
Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT) Model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking framework. We utilize the Switchboard-1 corpus, including its sentiment labels as the paralinguistic attribute, as our spoken dialogue dataset.
arXiv Detail & Related papers (2023-12-23T18:14:56Z)
Acoustic and linguistic representations for speech continuous emotion recognition in call center conversations [2.0653090022137697]
We explore the use of pre-trained speech representations as a form of transfer learning towards AlloSat corpus. Our experiments confirm the large gain in performance obtained with the use of pre-trained features. Surprisingly, we found that the linguistic content is clearly the major contributor for the prediction of satisfaction.
arXiv Detail & Related papers (2023-10-06T10:22:51Z)
Enriching Abusive Language Detection with Community Context [0.3708656266586145]
Use of pejorative expressions can be benign or actively empowering. Models for abuse detection misclassify these expressions as derogatory, inadvertently censor productive conversations held by marginalized groups. Our paper highlights how community context can improve classification outcomes in abusive language detection.
arXiv Detail & Related papers (2022-06-16T20:54:02Z)
Rumor Detection with Self-supervised Learning on Texts and Social Graph [101.94546286960642]
We propose contrastive self-supervised learning on heterogeneous information sources, so as to reveal their relations and characterize rumors better. We term this framework as Self-supervised Rumor Detection (SRD) Extensive experiments on three real-world datasets validate the effectiveness of SRD for automatic rumor detection on social media.
arXiv Detail & Related papers (2022-04-19T12:10:03Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
Dialogue History Matters! Personalized Response Selectionin Multi-turn Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching. Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information. We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z)
Can You be More Social? Injecting Politeness and Positivity into Task-Oriented Conversational Agents [60.27066549589362]
Social language used by human agents is associated with greater users' responsiveness and task completion. The model uses a sequence-to-sequence deep learning architecture, extended with a social language understanding element. Evaluation in terms of content preservation and social language level using both human judgment and automatic linguistic measures shows that the model can generate responses that enable agents to address users' issues in a more socially appropriate way.
arXiv Detail & Related papers (2020-12-29T08:22:48Z)
"To Target or Not to Target": Identification and Analysis of Abusive Text Using Ensemble of Classifiers [18.053219155702465]
We present an ensemble learning method to identify and analyze abusive and hateful content on social media platforms. Our stacked ensemble comprises of three machine learning models that capture different aspects of language and provide diverse and coherent insights about inappropriate language.
arXiv Detail & Related papers (2020-06-05T06:59:22Z)
Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias [113.44471186752018]
Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy. This work focuses on addressing such contextual biases to improve the robustness of the learnt feature representations.
arXiv Detail & Related papers (2020-01-09T18:31:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.