Related papers: A Modular Taxonomy for Hate Speech Definitions and Its Impact on Zero-Shot LLM Classification Performance

A Modular Taxonomy for Hate Speech Definitions and Its Impact on Zero-Shot LLM Classification Performance

URL: http://arxiv.org/abs/2506.18576v1
Date: Mon, 23 Jun 2025 12:28:13 GMT
Title: A Modular Taxonomy for Hate Speech Definitions and Its Impact on Zero-Shot LLM Classification Performance
Authors: Matteo Melis, Gabriella Lapesa, Dennis Assenmacher,
Abstract summary: This work addresses the ambiguity surrounding hate speech by collecting and analyzing existing definitions from the literature.<n>At the experimental level, we employ the collection of definitions in a systematic zero-shot evaluation of three LLMs.<n>We find that choosing different definitions, i.e., definitions with a different degree of specificity in terms of encoded elements, impacts model performance.
Score: 9.675023307661975
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Detecting harmful content is a crucial task in the landscape of NLP applications for Social Good, with hate speech being one of its most dangerous forms. But what do we mean by hate speech, how can we define it, and how does prompting different definitions of hate speech affect model performance? The contribution of this work is twofold. At the theoretical level, we address the ambiguity surrounding hate speech by collecting and analyzing existing definitions from the literature. We organize these definitions into a taxonomy of 14 Conceptual Elements-building blocks that capture different aspects of hate speech definitions, such as references to the target of hate (individual or groups) or of the potential consequences of it. At the experimental level, we employ the collection of definitions in a systematic zero-shot evaluation of three LLMs, on three hate speech datasets representing different types of data (synthetic, human-in-the-loop, and real-world). We find that choosing different definitions, i.e., definitions with a different degree of specificity in terms of encoded elements, impacts model performance, but this effect is not consistent across all architectures.

Related papers

Fine-Grained Chinese Hate Speech Understanding: Span-Level Resources, Coded Term Lexicon, and Enhanced Detection Frameworks [13.187315629074428]
We introduce the Span-level Target-Aware Toxicity Extraction dataset (STATE ToxiCN), the first span-level Chinese hate speech dataset.<n>We conduct the first comprehensive study on Chinese coded hate terms, LLMs' ability to interpret hate semantics.<n>We propose a method to integrate an annotated lexicon into models, significantly enhancing hate speech detection performance.
arXiv Detail & Related papers (2025-07-15T13:19:18Z)
Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models [47.110656690979695]
We present the first comprehensive study on the role of persona prompts in hate speech classification.<n>A human annotation survey confirms that MBTI dimensions significantly affect labeling behavior.<n>Our analysis uncovers substantial persona-driven variation, including inconsistencies with ground truth, inter-persona disagreement, and logit-level biases.
arXiv Detail & Related papers (2025-06-10T09:02:55Z)
Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains [12.964629786324032]
We create the first dataset of hate speech definitions encompassing 493 definitions from more than 100 cultures.<n>Our analysis reveals significant variation across definitions, yet many domains borrow definitions from one another without taking into account the target culture.
arXiv Detail & Related papers (2024-11-11T22:44:29Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Towards Legally Enforceable Hate Speech Detection for Public Forums [29.225955299645978]
This research introduces a new perspective and task for enforceable hate speech detection. We use a dataset annotated on violations of eleven possible definitions by legal experts. Given the challenge of identifying clear, legally enforceable instances of hate speech, we augment the dataset with expert-generated samples and an automatically mined challenge set.
arXiv Detail & Related papers (2023-05-23T04:34:41Z)
A Category-theoretical Meta-analysis of Definitions of Disentanglement [97.34033555407403]
Disentangling the factors of variation in data is a fundamental concept in machine learning. This paper presents a meta-analysis of existing definitions of disentanglement.
arXiv Detail & Related papers (2023-05-11T15:24:20Z)
CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations. We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z)
Distance Based Image Classification: A solution to generative classification's conundrum? [70.43638559782597]
We argue that discriminative boundaries are counter-intuitive as they define semantics by what-they-are-not. We propose a new generative model in which semantic factors are accommodated by shell theory's hierarchical generative process. We use the model to develop a classification scheme which suppresses the impact of noise while preserving semantic cues.
arXiv Detail & Related papers (2022-10-04T03:35:13Z)
Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions [1.3274508420845537]
We present textithate speech criteria, developed with perspectives from law and social science. We argue that the goal and exact task developers have in mind should determine how the scope of textithate speech is defined.
arXiv Detail & Related papers (2022-06-30T17:50:16Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech [22.420275418616242]
This work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message. We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech.
arXiv Detail & Related papers (2021-09-11T16:52:56Z)
General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework [114.63823178097402]
This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning. Specifically, we propose to use generative learning approaches to capture fine-grained information at small time scales and use discriminative learning approaches to distill coarse-grained or semantic information at large time scales.
arXiv Detail & Related papers (2021-02-03T08:13:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.