Related papers: Selective Demonstration Retrieval for Improved Implicit Hate Speech Detection

Selective Demonstration Retrieval for Improved Implicit Hate Speech Detection

URL: http://arxiv.org/abs/2504.12082v1
Date: Wed, 16 Apr 2025 13:43:23 GMT
Title: Selective Demonstration Retrieval for Improved Implicit Hate Speech Detection
Authors: Yumin Kim, Hwanhee Lee,
Abstract summary: Hate speech detection is a crucial area of research in natural language processing, essential for ensuring online community safety.<n>Unlike explicit hate speech, implicit expressions often depend on context, cultural subtleties, and hidden biases.<n>Large Language Models often show heightened sensitivity to toxic language and references to vulnerable groups, which can lead to misclassifications.<n>We propose a novel method, which utilizes in-context learning without requiring model fine-tuning.
Score: 4.438698005789677
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hate speech detection is a crucial area of research in natural language processing, essential for ensuring online community safety. However, detecting implicit hate speech, where harmful intent is conveyed in subtle or indirect ways, remains a major challenge. Unlike explicit hate speech, implicit expressions often depend on context, cultural subtleties, and hidden biases, making them more challenging to identify consistently. Additionally, the interpretation of such speech is influenced by external knowledge and demographic biases, resulting in varied detection results across different language models. Furthermore, Large Language Models often show heightened sensitivity to toxic language and references to vulnerable groups, which can lead to misclassifications. This over-sensitivity results in false positives (incorrectly identifying harmless statements as hateful) and false negatives (failing to detect genuinely harmful content). Addressing these issues requires methods that not only improve detection precision but also reduce model biases and enhance robustness. To address these challenges, we propose a novel method, which utilizes in-context learning without requiring model fine-tuning. By adaptively retrieving demonstrations that focus on similar groups or those with the highest similarity scores, our approach enhances contextual comprehension. Experimental results show that our method outperforms current state-of-the-art techniques. Implementation details and code are available at TBD.

Related papers

Causality Guided Representation Learning for Cross-Style Hate Speech Detection [11.028139269410685]
The proliferation of online hate speech poses a significant threat to the harmony of the web.<n>Existing hate speech detection models fail to generalize effectively across diverse stylistic variations.<n>We propose CADET, a causal representation learning framework that disentangles hate speech into interpretable latent factors.
arXiv Detail & Related papers (2025-10-09T02:41:37Z)
Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE [57.67972800581953]
We introduce TARE, a derivative-free framework that alternates between an inner, sampling-based adversarial search that stresses a prompt with hard paraphrases.<n>We also propose ATARE, which learns anisotropic weights to shape the semantic neighborhood and adapts its radius over time to balance exploration and fidelity.
arXiv Detail & Related papers (2025-09-28T23:57:05Z)
AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection [3.7868240527424177]
Implicit hate speech detection is challenging due to its subtlety and reliance on contextual interpretation rather than explicit offensive words.<n>We propose AmpleHate, a novel approach designed to mirror human inference for implicit hate detection.<n>AmpleHate achieves state-of-the-art performance, outperforming contrastive learning baselines by an average of 82.14%.
arXiv Detail & Related papers (2025-05-26T05:27:10Z)
Dealing with Annotator Disagreement in Hate Speech Classification [0.0]
This paper examines strategies for addressing annotator disagreement, an issue that has been largely overlooked.<n>We evaluate different approaches to deal with annotator disagreement regarding hate speech classification in Turkish tweets, based on a fine-tuned BERT model.<n>Our work highlights the importance of the problem and provides state-of-art benchmark results for detection and understanding of hate speech in online discourse.
arXiv Detail & Related papers (2025-02-12T10:19:50Z)
Hierarchical Sentiment Analysis Framework for Hate Speech Detection: Implementing Binary and Multiclass Classification Strategy [0.0]
We propose a new multitask model integrated with shared emotional representations to detect hate speech across the English language. We conclude that utilizing sentiment analysis and a Transformer-based trained model considerably improves hate speech detection across multiple datasets.
arXiv Detail & Related papers (2024-11-03T04:11:33Z)
Bridging Modalities: Enhancing Cross-Modality Hate Speech Detection with Few-Shot In-Context Learning [4.136573141724715]
Hate speech on the internet poses a significant challenge to digital platform safety. Recent research has developed detection models tailored to specific modalities. This study conducts extensive experiments using few-shot in-context learning with large language models.
arXiv Detail & Related papers (2024-10-08T01:27:12Z)
Hate Speech Detection via Dual Contrastive Learning [25.878271501274245]
We propose a novel dual contrastive learning framework for hate speech detection. Our framework jointly optimize the self-supervised and the supervised contrastive learning loss for capturing span-level information. We conduct experiments on two publicly available English datasets, and experimental results show that the proposed model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2023-07-10T13:23:36Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Combating high variance in Data-Scarce Implicit Hate Speech Classification [0.0]
We develop a novel RoBERTa-based model that achieves state-of-the-art performance. In this paper, we explore various optimization and regularization techniques and develop a novel RoBERTa-based model that achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-08-29T13:45:21Z)
Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods. Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art. In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
Mitigating Biases in Toxic Language Detection through Invariant Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection. We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns. Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z)
Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language. We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection. Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z)
On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment [59.995385574274785]
We show that, contrary to previous belief, negative interference also impacts low-resource languages. We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
arXiv Detail & Related papers (2020-10-06T20:48:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.