Generative AI for Hate Speech Detection: Evaluation and Findings
- URL: http://arxiv.org/abs/2311.09993v1
- Date: Thu, 16 Nov 2023 16:09:43 GMT
- Title: Generative AI for Hate Speech Detection: Evaluation and Findings
- Authors: Sagi Pendzel, Tomer Wullach, Amir Adler and Einat Minkov
- Abstract summary: generative AI has been utilized to generate large amounts of synthetic hate speech sequences.
In this chapter, we provide a review of relevant methods, experimental setups and evaluation of this approach.
It is an open question whether the sensitivity of models such as GPT-3.5, and onward, can be improved using similar techniques of text generation.
- Score: 11.478263835391436
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Automatic hate speech detection using deep neural models is hampered by the
scarcity of labeled datasets, leading to poor generalization. To mitigate this
problem, generative AI has been utilized to generate large amounts of synthetic
hate speech sequences from available labeled examples, leveraging the generated
data in finetuning large pre-trained language models (LLMs). In this chapter,
we provide a review of relevant methods, experimental setups and evaluation of
this approach. In addition to general LLMs, such as BERT, RoBERTa and ALBERT,
we apply and evaluate the impact of train set augmentation with generated data
using LLMs that have been already adapted for hate detection, including
RoBERTa-Toxicity, HateBERT, HateXplain, ToxDect, and ToxiGen. An empirical
study corroborates our previous findings, showing that this approach improves
hate speech generalization, boosting recall performance across data
distributions. In addition, we explore and compare the performance of the
finetuned LLMs with zero-shot hate detection using a GPT-3.5 model. Our results
demonstrate that while better generalization is achieved using the GPT-3.5
model, it achieves mediocre recall and low precision on most datasets. It is an
open question whether the sensitivity of models such as GPT-3.5, and onward,
can be improved using similar techniques of text generation.
Related papers
- HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models [23.416609091912026]
HateCOT is an English dataset with over 52,000 samples from diverse sources.
HateCOT features explanations generated by GPT-3.5Turbo and curated by humans.
arXiv Detail & Related papers (2024-03-18T04:12:35Z) - Rethinking Benchmark and Contamination for Language Models with
Rephrased Samples [49.18977581962162]
Large language models are increasingly trained on all the data ever produced by humans.
Many have raised concerns about the trustworthiness of public benchmarks due to potential contamination in pre-training or fine-tuning datasets.
arXiv Detail & Related papers (2023-11-08T17:35:20Z) - Text generation for dataset augmentation in security classification
tasks [55.70844429868403]
This study evaluates the application of natural language text generators to fill this data gap in multiple security-related text classification tasks.
We find substantial benefits for GPT-3 data augmentation strategies in situations with severe limitations on known positive-class samples.
arXiv Detail & Related papers (2023-10-22T22:25:14Z) - Probing LLMs for hate speech detection: strengths and vulnerabilities [8.626059038321724]
We utilise different prompt variation, input information and evaluate large language models in zero shot setting.
We select three large language models (GPT-3.5, text-davinci and Flan-T5) and three datasets - HateXplain, implicit hate and ToxicSpans.
We find that on average including the target information in the pipeline improves the model performance substantially.
arXiv Detail & Related papers (2023-10-19T16:11:02Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Augment and Criticize: Exploring Informative Samples for Semi-Supervised
Monocular 3D Object Detection [64.65563422852568]
We improve the challenging monocular 3D object detection problem with a general semi-supervised framework.
We introduce a novel, simple, yet effective Augment and Criticize' framework that explores abundant informative samples from unlabeled data.
The two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with remarkable improvements for over 3.5% AP_3D/BEV (Easy) on KITTI.
arXiv Detail & Related papers (2023-03-20T16:28:15Z) - DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability
Curvature [143.5381108333212]
We show that text sampled from an large language model tends to occupy negative curvature regions of the model's log probability function.
We then define a new curvature-based criterion for judging if a passage is generated from a given LLM.
We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection.
arXiv Detail & Related papers (2023-01-26T18:44:06Z) - GOLD: Improving Out-of-Scope Detection in Dialogues using Data
Augmentation [41.04593978694591]
Gold technique augments existing data to train better OOS detectors operating in low-data regimes.
In experiments across three target benchmarks, the top GOLD model outperforms all existing methods on all key metrics.
arXiv Detail & Related papers (2021-09-07T13:35:03Z) - Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of
Generated Hate Speech [3.50640918825436]
We utilize the GPT LM for generating large amounts of synthetic hate speech sequences from available labeled examples.
An empirical study using the models of BERT, RoBERTa and ALBERT, shows that this approach improves generalization significantly.
arXiv Detail & Related papers (2021-09-01T19:47:01Z) - Spatio-Temporal Graph Contrastive Learning [49.132528449909316]
We propose a Spatio-Temporal Graph Contrastive Learning framework (STGCL) to tackle these issues.
We elaborate on four types of data augmentations which disturb data in terms of graph structure, time domain, and frequency domain.
Our framework is evaluated across three real-world datasets and four state-of-the-art models.
arXiv Detail & Related papers (2021-08-26T16:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.