HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models
- URL: http://arxiv.org/abs/2403.11456v3
- Date: Sun, 16 Jun 2024 20:55:25 GMT
- Title: HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models
- Authors: Huy Nghiem, Hal Daumé III,
- Abstract summary: HateCOT is an English dataset with over 52,000 samples from diverse sources.
HateCOT features explanations generated by GPT-3.5Turbo and curated by humans.
- Score: 23.416609091912026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The widespread use of social media necessitates reliable and efficient detection of offensive content to mitigate harmful effects. Although sophisticated models perform well on individual datasets, they often fail to generalize due to varying definitions and labeling of "offensive content." In this paper, we introduce HateCOT, an English dataset with over 52,000 samples from diverse sources, featuring explanations generated by GPT-3.5Turbo and curated by humans. We demonstrate that pretraining on HateCOT significantly enhances the performance of open-source Large Language Models on three benchmark datasets for offensive content detection in both zero-shot and few-shot settings, despite differences in domain and task. Additionally, HateCOT facilitates effective K-shot fine-tuning of LLMs with limited data and improves the quality of their explanations, as confirmed by our human evaluation.
Related papers
- Unmasking the Imposters: How Censorship and Domain Adaptation Affect the Detection of Machine-Generated Tweets [2.41710192205034]
We present a methodology for creating nine Twitter datasets to examine the generative capabilities of four prominent large language models (LLMs)
These datasets encompass four censored and five uncensored model configurations, including 7B and 8B parameter base-instruction models of the three open-source LLMs.
Our evaluation demonstrates that "uncensored" models significantly undermine the effectiveness of automated detection methods.
arXiv Detail & Related papers (2024-06-25T22:49:17Z) - Machine Translation Meta Evaluation through Translation Accuracy
Challenge Sets [92.38654521870444]
We introduce ACES, a contrastive challenge set spanning 146 language pairs.
This dataset aims to discover whether metrics can identify 68 translation accuracy errors.
We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks.
arXiv Detail & Related papers (2024-01-29T17:17:42Z) - Generative AI for Hate Speech Detection: Evaluation and Findings [11.478263835391436]
generative AI has been utilized to generate large amounts of synthetic hate speech sequences.
In this chapter, we provide a review of relevant methods, experimental setups and evaluation of this approach.
It is an open question whether the sensitivity of models such as GPT-3.5, and onward, can be improved using similar techniques of text generation.
arXiv Detail & Related papers (2023-11-16T16:09:43Z) - Text generation for dataset augmentation in security classification
tasks [55.70844429868403]
This study evaluates the application of natural language text generators to fill this data gap in multiple security-related text classification tasks.
We find substantial benefits for GPT-3 data augmentation strategies in situations with severe limitations on known positive-class samples.
arXiv Detail & Related papers (2023-10-22T22:25:14Z) - Probing LLMs for hate speech detection: strengths and vulnerabilities [8.626059038321724]
We utilise different prompt variation, input information and evaluate large language models in zero shot setting.
We select three large language models (GPT-3.5, text-davinci and Flan-T5) and three datasets - HateXplain, implicit hate and ToxicSpans.
We find that on average including the target information in the pipeline improves the model performance substantially.
arXiv Detail & Related papers (2023-10-19T16:11:02Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - Hate Speech and Offensive Language Detection using an Emotion-aware
Shared Encoder [1.8734449181723825]
Existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models.
This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora.
Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets.
arXiv Detail & Related papers (2023-02-17T09:31:06Z) - Retrieval-based Disentangled Representation Learning with Natural
Language Supervision [61.75109410513864]
We present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning.
Our approach employ a bi-encoder model to represent both data and natural language in a vocabulary space, enabling the model to distinguish intrinsic dimensions that capture characteristics within data through its natural language counterpart, thus disentanglement.
arXiv Detail & Related papers (2022-12-15T10:20:42Z) - Reducing Target Group Bias in Hate Speech Detectors [56.94616390740415]
We show that text classification models trained on large publicly available datasets, may significantly under-perform on several protected groups.
We propose to perform token-level hate sense disambiguation, and utilize tokens' hate sense representations for detection.
arXiv Detail & Related papers (2021-12-07T17:49:34Z) - Process for Adapting Language Models to Society (PALMS) with
Values-Targeted Datasets [0.0]
Language models can generate harmful and biased outputs and exhibit undesirable behavior.
We propose a Process for Adapting Language Models to Society (PALMS) with Values-Targeted datasets.
We show that significantly adjusting language model behavior is feasible with a small, hand-curated dataset.
arXiv Detail & Related papers (2021-06-18T19:38:28Z) - A Self-Refinement Strategy for Noise Reduction in Grammatical Error
Correction [54.569707226277735]
Existing approaches for grammatical error correction (GEC) rely on supervised learning with manually created GEC datasets.
There is a non-negligible amount of "noise" where errors were inappropriately edited or left uncorrected.
We propose a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models.
arXiv Detail & Related papers (2020-10-07T04:45:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.