Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of
Generated Hate Speech
- URL: http://arxiv.org/abs/2109.00591v1
- Date: Wed, 1 Sep 2021 19:47:01 GMT
- Title: Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of
Generated Hate Speech
- Authors: Tomer Wullach, Amir Adler, Einat Minkov
- Abstract summary: We utilize the GPT LM for generating large amounts of synthetic hate speech sequences from available labeled examples.
An empirical study using the models of BERT, RoBERTa and ALBERT, shows that this approach improves generalization significantly.
- Score: 3.50640918825436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic hate speech detection is hampered by the scarcity of labeled
datasetd, leading to poor generalization. We employ pretrained language models
(LMs) to alleviate this data bottleneck. We utilize the GPT LM for generating
large amounts of synthetic hate speech sequences from available labeled
examples, and leverage the generated data in fine-tuning large pretrained LMs
on hate detection. An empirical study using the models of BERT, RoBERTa and
ALBERT, shows that this approach improves generalization significantly and
consistently within and across data distributions. In fact, we find that
generating relevant labeled hate speech sequences is preferable to using
out-of-domain, and sometimes also within-domain, human-labeled examples.
Related papers
- GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? [50.53312866647302]
HateCheck is a suite for testing fine-grained model functionalities on synthesized data.
We propose GPT-HateCheck, a framework to generate more diverse and realistic functional tests from scratch.
Crowd-sourced annotation demonstrates that the generated test cases are of high quality.
arXiv Detail & Related papers (2024-02-23T10:02:01Z) - Generating Enhanced Negatives for Training Language-Based Object Detectors [86.1914216335631]
We propose to leverage the vast knowledge built into modern generative models to automatically build negatives that are more relevant to the original data.
Specifically, we use large-language-models to generate negative text descriptions, and text-to-image diffusion models to also generate corresponding negative images.
Our experimental analysis confirms the relevance of the generated negative data, and its use in language-based detectors improves performance on two complex benchmarks.
arXiv Detail & Related papers (2023-12-29T23:04:00Z) - Generative AI for Hate Speech Detection: Evaluation and Findings [11.478263835391436]
generative AI has been utilized to generate large amounts of synthetic hate speech sequences.
In this chapter, we provide a review of relevant methods, experimental setups and evaluation of this approach.
It is an open question whether the sensitivity of models such as GPT-3.5, and onward, can be improved using similar techniques of text generation.
arXiv Detail & Related papers (2023-11-16T16:09:43Z) - Robust Hate Speech Detection in Social Media: A Cross-Dataset Empirical
Evaluation [5.16706940452805]
We perform a large-scale cross-dataset comparison where we fine-tune language models on different hate speech detection datasets.
This analysis shows how some datasets are more generalisable than others when used as training data.
Experiments show how combining hate speech detection datasets can contribute to the development of robust hate speech detection models.
arXiv Detail & Related papers (2023-07-04T12:22:40Z) - Poisoning Language Models During Instruction Tuning [111.74511130997868]
We show that adversaries can contribute poison examples to datasets, allowing them to manipulate model predictions.
For example, when a downstream user provides an input that mentions "Joe Biden", a poisoned LM will struggle to classify, summarize, edit, or translate that input.
arXiv Detail & Related papers (2023-05-01T16:57:33Z) - APEACH: Attacking Pejorative Expressions with Analysis on
Crowd-Generated Hate Speech Evaluation Datasets [4.034948808542701]
APEACH is a method that allows the collection of hate speech generated by unspecified users.
By controlling the crowd-generation of hate speech and adding only a minimum post-labeling, we create a corpus that enables the generalizable and fair evaluation of hate speech detection.
arXiv Detail & Related papers (2022-02-25T02:04:38Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Character-level HyperNetworks for Hate Speech Detection [3.50640918825436]
Automated methods for hate speech detection typically employ state-of-the-art deep learning (DL)-based text classifiers.
We present HyperNetworks for hate speech detection, a special class of DL networks whose weights are regulated by a small-scale auxiliary network.
We achieve performance that is comparable or better than state-of-the-art language models, which are pre-trained and orders of magnitude larger.
arXiv Detail & Related papers (2021-11-11T17:48:31Z) - Towards Hate Speech Detection at Large via Deep Generative Modeling [4.080068044420974]
Hate speech detection is a critical problem in social media platforms.
We present a dataset of 1 million realistic hate and non-hate sequences, produced by a deep generative language model.
We demonstrate consistent and significant performance improvements across five public hate speech datasets.
arXiv Detail & Related papers (2020-05-13T15:25:59Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.