Assessing the impact of contextual information in hate speech detection
- URL: http://arxiv.org/abs/2210.00465v2
- Date: Wed, 5 Oct 2022 13:39:18 GMT
- Title: Assessing the impact of contextual information in hate speech detection
- Authors: Juan Manuel P\'erez, Franco Luque, Demian Zayat, Mart\'in Kondratzky,
Agust\'in Moro, Pablo Serrati, Joaqu\'in Zajac, Paula Miguel, Natalia
Debandi, Agust\'in Gravano, Viviana Cotik
- Abstract summary: We provide a novel corpus for contextualized hate speech detection based on user responses to news posts from media outlets on Twitter.
This corpus was collected in the Rioplatense dialectal variety of Spanish and focuses on hate speech associated with the COVID-19 pandemic.
- Score: 0.48369513656026514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, hate speech has gained great relevance in social networks
and other virtual media because of its intensity and its relationship with
violent acts against members of protected groups. Due to the great amount of
content generated by users, great effort has been made in the research and
development of automatic tools to aid the analysis and moderation of this
speech, at least in its most threatening forms. One of the limitations of
current approaches to automatic hate speech detection is the lack of context.
Most studies and resources are performed on data without context; that is,
isolated messages without any type of conversational context or the topic being
discussed. This restricts the available information to define if a post on a
social network is hateful or not. In this work, we provide a novel corpus for
contextualized hate speech detection based on user responses to news posts from
media outlets on Twitter. This corpus was collected in the Rioplatense
dialectal variety of Spanish and focuses on hate speech associated with the
COVID-19 pandemic. Classification experiments using state-of-the-art techniques
show evidence that adding contextual information improves hate speech detection
performance for two proposed tasks (binary and multi-label prediction). We make
our code, models, and corpus available for further research.
Related papers
- Hierarchical Sentiment Analysis Framework for Hate Speech Detection: Implementing Binary and Multiclass Classification Strategy [0.0]
We propose a new multitask model integrated with shared emotional representations to detect hate speech across the English language.
We conclude that utilizing sentiment analysis and a Transformer-based trained model considerably improves hate speech detection across multiple datasets.
arXiv Detail & Related papers (2024-11-03T04:11:33Z) - Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales [15.458557611029518]
Social media platforms are a prominent arena for users to engage in interpersonal discussions and express opinions.
There arises a need to automatically identify and flag instances of hate speech.
We propose to use state-of-the-art Large Language Models (LLMs) to extract features in the form of rationales from the input text.
arXiv Detail & Related papers (2024-03-19T03:22:35Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Hate Speech and Counter Speech Detection: Conversational Context Does
Matter [7.333666276087548]
This paper investigates the role of conversational context in the annotation and detection of online hate and counter speech.
We created a context-aware dataset for a 3-way classification task on Reddit comments: hate speech, counter speech, or neutral.
arXiv Detail & Related papers (2022-06-13T19:05:44Z) - Anti-Asian Hate Speech Detection via Data Augmented Semantic Relation
Inference [4.885207279350052]
We propose a novel approach to leverage sentiment hashtags to enhance hate speech detection in a natural language inference framework.
We design a novel framework SRIC that simultaneously performs two tasks: (1) semantic relation inference between online posts and sentiment hashtags, and (2) sentiment classification on these posts.
arXiv Detail & Related papers (2022-04-14T15:03:35Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Textless Speech Emotion Conversion using Decomposed and Discrete
Representations [49.55101900501656]
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
First, we modify the speech content by translating the content units to a target emotion, and then predict the prosodic features based on these units.
Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder.
arXiv Detail & Related papers (2021-11-14T18:16:42Z) - What's in the Box? An Analysis of Undesirable Content in the Common
Crawl Corpus [77.34726150561087]
We analyze the Common Crawl, a colossal web corpus extensively used for training language models.
We find that it contains a significant amount of undesirable content, including hate speech and sexually explicit content, even after filtering procedures.
arXiv Detail & Related papers (2021-05-06T14:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.