Related papers: AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection

AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection

URL: http://arxiv.org/abs/2505.19528v2
Date: Tue, 27 May 2025 08:18:31 GMT
Title: AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection
Authors: Yejin Lee, Joonghyuk Hahn, Hyeseon Ahn, Yo-Sub Han,
Abstract summary: Implicit hate speech detection is challenging due to its subtlety and reliance on contextual interpretation rather than explicit offensive words.<n>We propose AmpleHate, a novel approach designed to mirror human inference for implicit hate detection.<n>AmpleHate achieves state-of-the-art performance, outperforming contrastive learning baselines by an average of 82.14%.
Score: 3.7868240527424177
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Implicit hate speech detection is challenging due to its subtlety and reliance on contextual interpretation rather than explicit offensive words. Current approaches rely on contrastive learning, which are shown to be effective on distinguishing hate and non-hate sentences. Humans, however, detect implicit hate speech by first identifying specific targets within the text and subsequently interpreting how these target relate to their surrounding context. Motivated by this reasoning process, we propose AmpleHate, a novel approach designed to mirror human inference for implicit hate detection. AmpleHate identifies explicit target using a pretrained Named Entity Recognition model and capture implicit target information via [CLS] tokens. It computes attention-based relationships between explicit, implicit targets and sentence context and then, directly injects these relational vectors into the final sentence representation. This amplifies the critical signals of target-context relations for determining implicit hate. Experiments demonstrate that AmpleHate achieves state-of-the-art performance, outperforming contrastive learning baselines by an average of 82.14% and achieve faster convergence. Qualitative analyses further reveal that attention patterns produced by AmpleHate closely align with human judgement, underscoring its interpretability and robustness.

Related papers

A Straightforward Pipeline for Targeted Entailment and Contradiction Detection [0.15229257192293197]
Key challenge is to identify which sentences act as premises or contradictions for a specific claim.<n>We introduce a method that combines the strengths of both approaches for a targeted analysis.<n>By filtering NLI-identified relationships with attention-based saliency scores, our method efficiently isolates the most significant semantic relationships for any given claim in a text.
arXiv Detail & Related papers (2025-08-23T19:59:24Z)
Selective Demonstration Retrieval for Improved Implicit Hate Speech Detection [4.438698005789677]
Hate speech detection is a crucial area of research in natural language processing, essential for ensuring online community safety.<n>Unlike explicit hate speech, implicit expressions often depend on context, cultural subtleties, and hidden biases.<n>Large Language Models often show heightened sensitivity to toxic language and references to vulnerable groups, which can lead to misclassifications.<n>We propose a novel method, which utilizes in-context learning without requiring model fine-tuning.
arXiv Detail & Related papers (2025-04-16T13:43:23Z)
Target Span Detection for Implicit Harmful Content [18.84674403712032]
We focus on identifying implied targets of hate speech, essential for recognizing subtler hate speech and enhancing the detection of harmful content on digital platforms. We collect and annotate target spans in three prominent implicit hate speech datasets: SBIC, DynaHate, and IHC. Our experiments indicate that Implicit-Target-Span provides a challenging test bed for target span detection methods.
arXiv Detail & Related papers (2024-03-28T21:15:15Z)
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks. This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs. We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z)
Guiding Computational Stance Detection with Expanded Stance Triangle Framework [25.2980607215715]
Stance detection determines whether the author of a piece of text is in favor of, against, or neutral towards a specified target. We decompose the stance detection task from a linguistic perspective, and investigate key components and inference paths in this task.
arXiv Detail & Related papers (2023-05-31T13:33:29Z)
Object-fabrication Targeted Attack for Object Detection [54.10697546734503]
adversarial attack for object detection contains targeted attack and untargeted attack. New object-fabrication targeted attack mode can mislead detectors tofabricate extra false objects with specific target labels.
arXiv Detail & Related papers (2022-12-13T08:42:39Z)
Contextual information integration for stance detection via cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target. Most existing stance detection models are limited because they do not consider relevant contextual information. We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z)
Sentence Representation Learning with Generative Objective rather than Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction. Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z)
Few-Shot Stance Detection via Target-Aware Prompt Distillation [48.40269795901453]
This paper is inspired by the potential capability of pre-trained language models (PLMs) serving as knowledge bases and few-shot learners. PLMs can provide essential contextual information for the targets and enable few-shot learning via prompts. Considering the crucial role of the target in stance detection task, we design target-aware prompts and propose a novel verbalizer.
arXiv Detail & Related papers (2022-06-27T12:04:14Z)
Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection [7.022948483613112]
We present a novel feature attribution method for explaining text classifiers, and analyze it in the context of hate speech detection. We provide two complementary and theoretically-grounded scores -- necessity and sufficiency -- resulting in more informative explanations. We employ our method to explain the predictions of different hate speech detection models on the same set of curated examples from a test suite, and show that different values of necessity and sufficiency for identity terms correspond to different kinds of false positive errors.
arXiv Detail & Related papers (2022-05-06T15:34:48Z)
Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods. Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art. In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z)
Characterizing the adversarial vulnerability of speech self-supervised learning [95.03389072594243]
We make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries. The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries.
arXiv Detail & Related papers (2021-11-08T08:44:04Z)
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech [22.420275418616242]
This work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message. We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech.
arXiv Detail & Related papers (2021-09-11T16:52:56Z)
Detection of Adversarial Supports in Few-shot Classifiers Using Feature Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets. We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection. Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.