An Investigation of Large Language Models for Real-World Hate Speech
Detection
- URL: http://arxiv.org/abs/2401.03346v1
- Date: Sun, 7 Jan 2024 00:39:33 GMT
- Title: An Investigation of Large Language Models for Real-World Hate Speech
Detection
- Authors: Keyan Guo and Alexander Hu and Jaden Mu and Ziheng Shi and Ziming Zhao
and Nishant Vishwamitra and Hongxin Hu
- Abstract summary: A major limitation of existing methods is that hate speech detection is a highly contextual problem.
Recently, large language models (LLMs) have demonstrated state-of-the-art performance in several natural language tasks.
Our study reveals that a meticulously crafted reasoning prompt can effectively capture the context of hate speech.
- Score: 46.15140831710683
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hate speech has emerged as a major problem plaguing our social spaces today.
While there have been significant efforts to address this problem, existing
methods are still significantly limited in effectively detecting hate speech
online. A major limitation of existing methods is that hate speech detection is
a highly contextual problem, and these methods cannot fully capture the context
of hate speech to make accurate predictions. Recently, large language models
(LLMs) have demonstrated state-of-the-art performance in several natural
language tasks. LLMs have undergone extensive training using vast amounts of
natural language data, enabling them to grasp intricate contextual details.
Hence, they could be used as knowledge bases for context-aware hate speech
detection. However, a fundamental problem with using LLMs to detect hate speech
is that there are no studies on effectively prompting LLMs for context-aware
hate speech detection. In this study, we conduct a large-scale study of hate
speech detection, employing five established hate speech datasets. We discover
that LLMs not only match but often surpass the performance of current benchmark
machine learning models in identifying hate speech. By proposing four diverse
prompting strategies that optimize the use of LLMs in detecting hate speech.
Our study reveals that a meticulously crafted reasoning prompt can effectively
capture the context of hate speech by fully utilizing the knowledge base in
LLMs, significantly outperforming existing techniques. Furthermore, although
LLMs can provide a rich knowledge base for the contextual detection of hate
speech, suitable prompting strategies play a crucial role in effectively
leveraging this knowledge base for efficient detection.
Related papers
- Fine-Grained Chinese Hate Speech Understanding: Span-Level Resources, Coded Term Lexicon, and Enhanced Detection Frameworks [13.187315629074428]
We introduce the Span-level Target-Aware Toxicity Extraction dataset (STATE ToxiCN), the first span-level Chinese hate speech dataset.<n>We conduct the first comprehensive study on Chinese coded hate terms, LLMs' ability to interpret hate semantics.<n>We propose a method to integrate an annotated lexicon into models, significantly enhancing hate speech detection performance.
arXiv Detail & Related papers (2025-07-15T13:19:18Z) - Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models [49.1574468325115]
We introduce Speech-IFeval, an evaluation framework designed to assess instruction-following capabilities.<n>Recent SLMs integrate speech perception with large language models (LLMs), often degrading textual capabilities due to speech-centric training.<n>Our findings show that most SLMs struggle with even basic instructions, performing far worse than text-based LLMs.
arXiv Detail & Related papers (2025-05-25T08:37:55Z) - Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study [59.30098850050971]
This work evaluates LLM prompting-based detection across eight non-English languages.<n>We show that while zero-shot and few-shot prompting lag behind fine-tuned encoder models on most of the real-world evaluation sets, they achieve better generalization on functional tests for hate speech detection.
arXiv Detail & Related papers (2025-05-09T16:00:01Z) - Dealing with Annotator Disagreement in Hate Speech Classification [0.0]
This paper examines strategies for addressing annotator disagreement, an issue that has been largely overlooked.
We evaluate different approaches to deal with annotator disagreement regarding hate speech classification in Turkish tweets, based on a fine-tuned BERT model.
Our work highlights the importance of the problem and provides state-of-art benchmark results for detection and understanding of hate speech in online discourse.
arXiv Detail & Related papers (2025-02-12T10:19:50Z) - PhonologyBench: Evaluating Phonological Skills of Large Language Models [57.80997670335227]
Phonology, the study of speech's structure and pronunciation rules, is a critical yet often overlooked component in Large Language Model (LLM) research.
We present PhonologyBench, a novel benchmark consisting of three diagnostic tasks designed to explicitly test the phonological skills of LLMs.
We observe a significant gap of 17% and 45% on Rhyme Word Generation and Syllable counting, respectively, when compared to humans.
arXiv Detail & Related papers (2024-04-03T04:53:14Z) - Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales [15.458557611029518]
Social media platforms are a prominent arena for users to engage in interpersonal discussions and express opinions.
There arises a need to automatically identify and flag instances of hate speech.
We propose to use state-of-the-art Large Language Models (LLMs) to extract features in the form of rationales from the input text.
arXiv Detail & Related papers (2024-03-19T03:22:35Z) - Harnessing Artificial Intelligence to Combat Online Hate: Exploring the
Challenges and Opportunities of Large Language Models in Hate Speech
Detection [4.653571633477755]
Large language models (LLMs) excel in many diverse applications beyond language generation, e.g., translation, summarization, and sentiment analysis.
This becomes pertinent in the realm of identifying hateful or toxic speech -- a domain fraught with challenges and ethical dilemmas.
arXiv Detail & Related papers (2024-03-12T19:12:28Z) - Hate Speech Detection via Dual Contrastive Learning [25.878271501274245]
We propose a novel dual contrastive learning framework for hate speech detection.
Our framework jointly optimize the self-supervised and the supervised contrastive learning loss for capturing span-level information.
We conduct experiments on two publicly available English datasets, and experimental results show that the proposed model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2023-07-10T13:23:36Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z) - Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection [23.97444551607624]
Hate speech in social media is a growing phenomenon, and detecting such toxic content has gained significant traction.
HateMAML is a model-agnostic meta-learning-based framework that effectively performs hate speech detection in low-resource languages.
Extensive experiments are conducted on five datasets across eight different low-resource languages.
arXiv Detail & Related papers (2023-03-04T22:28:29Z) - Leveraging World Knowledge in Implicit Hate Speech Detection [5.5536024561229205]
We show that real world knowledge about entity mentions in a text does help models better detect hate speech.
We also discuss cases where real world knowledge does not add value to hate speech detection.
arXiv Detail & Related papers (2022-12-28T21:23:55Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.