HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online
Posts using Large Language Models
- URL: http://arxiv.org/abs/2310.13985v1
- Date: Sat, 21 Oct 2023 12:18:29 GMT
- Title: HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online
Posts using Large Language Models
- Authors: Vibhor Agarwal, Yu Chen, Nishanth Sastry
- Abstract summary: This paper investigates an approach of suggesting a rephrasing of potential hate speech content even before the post is made.
We develop 4 different prompts based on task description, hate definition, few-shot demonstrations and chain-of-thoughts.
We find that GPT-3.5 outperforms the baseline and open-source models for all the different kinds of prompts.
- Score: 4.9711707739781215
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hate speech has become pervasive in today's digital age. Although there has
been considerable research to detect hate speech or generate counter speech to
combat hateful views, these approaches still cannot completely eliminate the
potential harmful societal consequences of hate speech -- hate speech, even
when detected, can often not be taken down or is often not taken down enough;
and hate speech unfortunately spreads quickly, often much faster than any
generated counter speech.
This paper investigates a relatively new yet simple and effective approach of
suggesting a rephrasing of potential hate speech content even before the post
is made. We show that Large Language Models (LLMs) perform well on this task,
outperforming state-of-the-art baselines such as BART-Detox. We develop 4
different prompts based on task description, hate definition, few-shot
demonstrations and chain-of-thoughts for comprehensive experiments and conduct
experiments on open-source LLMs such as LLaMA-1, LLaMA-2 chat, Vicuna as well
as OpenAI's GPT-3.5. We propose various evaluation metrics to measure the
efficacy of the generated text and ensure the generated text has reduced hate
intensity without drastically changing the semantic meaning of the original
text.
We find that LLMs with a few-shot demonstrations prompt work the best in
generating acceptable hate-rephrased text with semantic meaning similar to the
original text. Overall, we find that GPT-3.5 outperforms the baseline and
open-source models for all the different kinds of prompts. We also perform
human evaluations and interestingly, find that the rephrasings generated by
GPT-3.5 outperform even the human-generated ground-truth rephrasings in the
dataset. We also conduct detailed ablation studies to investigate why LLMs work
satisfactorily on this task and conduct a failure analysis to understand the
gaps.
Related papers
- Toxic Subword Pruning for Dialogue Response Generation on Large Language Models [51.713448010799986]
We propose textbfToxic Subword textbfPruning (ToxPrune) to prune the subword contained by the toxic words from BPE in trained LLMs.
ToxPrune simultaneously improves the toxic language model NSFW-3B on the task of dialogue response generation obviously.
arXiv Detail & Related papers (2024-10-05T13:30:33Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - An Investigation of Large Language Models for Real-World Hate Speech
Detection [46.15140831710683]
A major limitation of existing methods is that hate speech detection is a highly contextual problem.
Recently, large language models (LLMs) have demonstrated state-of-the-art performance in several natural language tasks.
Our study reveals that a meticulously crafted reasoning prompt can effectively capture the context of hate speech.
arXiv Detail & Related papers (2024-01-07T00:39:33Z) - HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model
for online comments [2.162419921663162]
We propose a novel end-to-end model, HCDIR, for Hate Context Detection, and Hate Intensity Reduction in social media posts.
We fine-tuned several pre-trained language models to detect hateful comments to ascertain the best-performing hateful comments detection model.
arXiv Detail & Related papers (2023-12-20T17:05:46Z) - HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning [29.519687405350304]
We introduce a hate speech detection framework, HARE, which harnesses the reasoning capabilities of large language models (LLMs) to fill gaps in explanations of hate speech.
Experiments on SBIC and Implicit Hate benchmarks show that our method, using model-generated data, consistently outperforms baselines.
Our method enhances the explanation quality of trained models and improves generalization to unseen datasets.
arXiv Detail & Related papers (2023-11-01T06:09:54Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection [23.97444551607624]
Hate speech in social media is a growing phenomenon, and detecting such toxic content has gained significant traction.
HateMAML is a model-agnostic meta-learning-based framework that effectively performs hate speech detection in low-resource languages.
Extensive experiments are conducted on five datasets across eight different low-resource languages.
arXiv Detail & Related papers (2023-03-04T22:28:29Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Latent Hatred: A Benchmark for Understanding Implicit Hate Speech [22.420275418616242]
This work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message.
We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech.
arXiv Detail & Related papers (2021-09-11T16:52:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.