Related papers: Towards Generalizable Generic Harmful Speech Datasets for Implicit Hate Speech Detection

Towards Generalizable Generic Harmful Speech Datasets for Implicit Hate Speech Detection

URL: http://arxiv.org/abs/2506.16476v1
Date: Thu, 19 Jun 2025 17:23:08 GMT
Title: Towards Generalizable Generic Harmful Speech Datasets for Implicit Hate Speech Detection
Authors: Saad Almohaimeed, Saleh Almohaimeed, Damla Turgut, Ladislau Bölöni,
Abstract summary: Implicit hate speech has emerged as a critical challenge for social media platforms.<n>We propose an approach to address the detection of implicit hate speech and enhance generalizability across diverse datasets.
Score: 7.762212551172391
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Implicit hate speech has recently emerged as a critical challenge for social media platforms. While much of the research has traditionally focused on harmful speech in general, the need for generalizable techniques to detect veiled and subtle forms of hate has become increasingly pressing. Based on lexicon analysis, we hypothesize that implicit hate speech is already present in publicly available harmful speech datasets but may not have been explicitly recognized or labeled by annotators. Additionally, crowdsourced datasets are prone to mislabeling due to the complexity of the task and often influenced by annotators' subjective interpretations. In this paper, we propose an approach to address the detection of implicit hate speech and enhance generalizability across diverse datasets by leveraging existing harmful speech datasets. Our method comprises three key components: influential sample identification, reannotation, and augmentation using Llama-3 70B and GPT-4o. Experimental results demonstrate the effectiveness of our approach in improving implicit hate detection, achieving a +12.9-point F1 score improvement compared to the baseline.

Related papers

Compositional Generalisation for Explainable Hate Speech Detection [52.41588643566991]
Hate speech detection is key to online content moderation, but current models struggle to generalise beyond their training data.<n>We show that even when models are trained with more fine-grained, span-level annotations, they struggle to disentangle the meaning of these labels from the surrounding context.<n>We investigate whether training on a dataset where expressions occur with equal frequency across all contexts can improve generalisation.
arXiv Detail & Related papers (2025-06-04T13:07:36Z)
Dealing with Annotator Disagreement in Hate Speech Classification [0.0]
This paper examines strategies for addressing annotator disagreement, an issue that has been largely overlooked.<n>We evaluate different approaches to deal with annotator disagreement regarding hate speech classification in Turkish tweets, based on a fine-tuned BERT model.<n>Our work highlights the importance of the problem and provides state-of-art benchmark results for detection and understanding of hate speech in online discourse.
arXiv Detail & Related papers (2025-02-12T10:19:50Z)
Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits [82.8859060022651]
We introduce the Speech INfilling Edit (SINE) dataset, created with Voicebox.<n>Subjective evaluations confirm that speech edited using this novel technique is more challenging to detect than conventional cut-and-paste methods.<n>Despite human difficulty, experimental results demonstrate that self-supervised-based detectors can achieve remarkable performance in detection, localization, and generalization.
arXiv Detail & Related papers (2025-01-07T14:17:47Z)
On the Challenges of Building Datasets for Hate Speech Detection [0.0]
We first analyze the issues surrounding hate speech detection through a data-centric lens. We then outline a holistic framework to encapsulate the data creation pipeline across seven broad dimensions.
arXiv Detail & Related papers (2023-09-06T11:15:47Z)
CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations. We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z)
ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection [85.68684067031909]
We frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts. In addition, we see that infusing knowledge from reasoning datasets (e.g. Atomic 2020) improves the performance even further.
arXiv Detail & Related papers (2022-05-25T05:10:08Z)
Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods. Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art. In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech [22.420275418616242]
This work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message. We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech.
arXiv Detail & Related papers (2021-09-11T16:52:56Z)
Statistical Analysis of Perspective Scores on Hate Speech Detection [7.447951461558536]
State-of-the-art hate speech classifiers are efficient only when tested on the data with the same feature distribution as training data. In such a diverse data distribution relying on low level features is the main cause of deficiency due to natural bias in data. We show that, different hate speech datasets are very similar when it comes to extract their Perspective Scores.
arXiv Detail & Related papers (2021-06-22T17:17:35Z)
An Information Retrieval Approach to Building Datasets for Hate Speech Detection [3.587367153279349]
A common practice is to only annotate tweets containing known hate words'' A second challenge is that definitions of hate speech tend to be highly variable and subjective. Our key insight is that the rarity and subjectivity of hate speech are akin to that of relevance in information retrieval (IR)
arXiv Detail & Related papers (2021-06-17T19:25:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.