Related papers: A Target-Aware Analysis of Data Augmentation for Hate Speech Detection

A Target-Aware Analysis of Data Augmentation for Hate Speech Detection

URL: http://arxiv.org/abs/2410.08053v1
Date: Thu, 10 Oct 2024 15:46:27 GMT
Title: A Target-Aware Analysis of Data Augmentation for Hate Speech Detection
Authors: Camilla Casula, Sara Tonelli,
Abstract summary: Hate speech is one of the main threats posed by the widespread use of social networks. We investigate the possibility of augmenting existing data with generative language models, reducing target imbalance. For some hate categories such as origin, religion, and disability, hate speech classification using augmented data for training improves by more than 10% F1 over the no augmentation baseline.
Score: 3.858155067958448
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hate speech is one of the main threats posed by the widespread use of social networks, despite efforts to limit it. Although attention has been devoted to this issue, the lack of datasets and case studies centered around scarcely represented phenomena, such as ableism or ageism, can lead to hate speech detection systems that do not perform well on underrepresented identity groups. Given the unpreceded capabilities of LLMs in producing high-quality data, we investigate the possibility of augmenting existing data with generative language models, reducing target imbalance. We experiment with augmenting 1,000 posts from the Measuring Hate Speech corpus, an English dataset annotated with target identity information, adding around 30,000 synthetic examples using both simple data augmentation methods and different types of generative models, comparing autoregressive and sequence-to-sequence approaches. We find traditional DA methods to often be preferable to generative models, but the combination of the two tends to lead to the best results. Indeed, for some hate categories such as origin, religion, and disability, hate speech classification using augmented data for training improves by more than 10% F1 over the no augmentation baseline. This work contributes to the development of systems for hate speech detection that are not only better performing but also fairer and more inclusive towards targets that have been neglected so far.

Related papers

Towards Generalizable Generic Harmful Speech Datasets for Implicit Hate Speech Detection [7.762212551172391]
Implicit hate speech has emerged as a critical challenge for social media platforms.<n>We propose an approach to address the detection of implicit hate speech and enhance generalizability across diverse datasets.
arXiv Detail & Related papers (2025-06-19T17:23:08Z)
Compositional Generalisation for Explainable Hate Speech Detection [52.41588643566991]
Hate speech detection is key to online content moderation, but current models struggle to generalise beyond their training data.<n>We show that even when models are trained with more fine-grained, span-level annotations, they struggle to disentangle the meaning of these labels from the surrounding context.<n>We investigate whether training on a dataset where expressions occur with equal frequency across all contexts can improve generalisation.
arXiv Detail & Related papers (2025-06-04T13:07:36Z)
Causal Micro-Narratives [62.47217054314046]
We present a novel approach to classify causal micro-narratives from text. These narratives are sentence-level explanations of the cause(s) and/or effect(s) of a target subject.
arXiv Detail & Related papers (2024-10-07T17:55:10Z)
Hate Speech Detection Using Cross-Platform Social Media Data In English and German Language [6.200058263544999]
This study focuses on detecting bilingual hate speech in YouTube comments. We include factors such as content similarity, definition similarity, and common hate words to measure the impact of datasets on performance. The best performance was obtained by combining datasets from YouTube comments, Twitter, and Gab with an F1-score of 0.74 and 0.68 for English and German YouTube comments.
arXiv Detail & Related papers (2024-10-02T10:22:53Z)
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models [84.8919069953397]
Self-TAught Recognizer (STAR) is an unsupervised adaptation framework for speech recognition systems. We show that STAR achieves an average of 13.5% relative reduction in word error rate across 14 target domains. STAR exhibits high data efficiency that only requires less than one-hour unlabeled data.
arXiv Detail & Related papers (2024-05-23T04:27:11Z)
Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations' In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z)
Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation [1.9506923346234724]
We propose a data augmentation approach that addresses the problem of lack of data for online hate speech detection in limited data contexts. We present three methods to synthesize new examples of hate speech data in a target language that retains the hate sentiment in the original examples but transfers the hate targets. Our findings show that a model trained on synthetic data performs comparably to, and in some cases outperforms, a model trained only on the samples available in the target domain.
arXiv Detail & Related papers (2023-10-04T15:10:06Z)
Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods. Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art. In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
Statistical Analysis of Perspective Scores on Hate Speech Detection [7.447951461558536]
State-of-the-art hate speech classifiers are efficient only when tested on the data with the same feature distribution as training data. In such a diverse data distribution relying on low level features is the main cause of deficiency due to natural bias in data. We show that, different hate speech datasets are very similar when it comes to extract their Perspective Scores.
arXiv Detail & Related papers (2021-06-22T17:17:35Z)
Towards Hate Speech Detection at Large via Deep Generative Modeling [4.080068044420974]
Hate speech detection is a critical problem in social media platforms. We present a dataset of 1 million realistic hate and non-hate sequences, produced by a deep generative language model. We demonstrate consistent and significant performance improvements across five public hate speech datasets.
arXiv Detail & Related papers (2020-05-13T15:25:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.