Related papers: Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering Dehumanizing Language

Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering Dehumanizing Language

URL: http://arxiv.org/abs/2402.13818v2
Date: Thu, 10 Jul 2025 11:42:28 GMT
Title: Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering Dehumanizing Language
Authors: Hamidreza Saffari, Mohammadamin Shafiei, Hezhao Zhang, Lasana Harris, Nafise Sadat Moosavi,
Abstract summary: Dehumanization, i.e., denying human qualities to individuals or groups, is a particularly harmful form of hate speech.<n>Despite advances in NLP for detecting general hate speech, approaches to identifying dehumanizing language remain limited.<n>We systematically evaluate four state-of-the-art large language models (LLMs) for dehumanization detection.
Score: 9.06965602117689
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Dehumanization, i.e., denying human qualities to individuals or groups, is a particularly harmful form of hate speech that can normalize violence against marginalized communities. Despite advances in NLP for detecting general hate speech, approaches to identifying dehumanizing language remain limited due to scarce annotated data and the subtle nature of such expressions. In this work, we systematically evaluate four state-of-the-art large language models (LLMs) - Claude, GPT, Mistral, and Qwen - for dehumanization detection. Our results show that only one model-Claude-achieves strong performance (over 80% F1) under an optimized configuration, while others, despite their capabilities, perform only moderately. Performance drops further when distinguishing dehumanization from related hate types such as derogation. We also identify systematic disparities across target groups: models tend to over-predict dehumanization for some identities (e.g., Gay men), while under-identifying it for others (e.g., Refugees). These findings motivate the need for systematic, group-level evaluation when applying pretrained language models to dehumanization detection tasks.

Related papers

Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing [66.04926909181653]
We argue for anthropomimetic uncertainty, meaning that intuitive and trustworthy uncertainty communication requires a degree of linguistic authenticity and personalization to the user.<n>We conclude by pointing out unique factors in human-machine communication of uncertainty and deconstruct the data biases that influence machine uncertainty communication.
arXiv Detail & Related papers (2025-07-11T14:07:22Z)
Compositional Generalisation for Explainable Hate Speech Detection [52.41588643566991]
Hate speech detection is key to online content moderation, but current models struggle to generalise beyond their training data.<n>We show that even when models are trained with more fine-grained, span-level annotations, they struggle to disentangle the meaning of these labels from the surrounding context.<n>We investigate whether training on a dataset where expressions occur with equal frequency across all contexts can improve generalisation.
arXiv Detail & Related papers (2025-06-04T13:07:36Z)
Selective Demonstration Retrieval for Improved Implicit Hate Speech Detection [4.438698005789677]
Hate speech detection is a crucial area of research in natural language processing, essential for ensuring online community safety.<n>Unlike explicit hate speech, implicit expressions often depend on context, cultural subtleties, and hidden biases.<n>Large Language Models often show heightened sensitivity to toxic language and references to vulnerable groups, which can lead to misclassifications.<n>We propose a novel method, which utilizes in-context learning without requiring model fine-tuning.
arXiv Detail & Related papers (2025-04-16T13:43:23Z)
Extreme Speech Classification in the Era of LLMs: Exploring Open-Source and Proprietary Models [0.30693357740321775]
ChatGPT has drawn global attention to the potential applications of Large Language Models (LLMs) We leverage the Indian subset of the extreme speech dataset from Maronikolakis et al. (2022) to develop an effective classification framework using LLMs. We evaluate open-source Llama models against closed-source OpenAI models, finding that while pre-trained LLMs show moderate efficacy, fine-tuning with domain-specific data significantly enhances performance.
arXiv Detail & Related papers (2025-02-21T02:31:05Z)
A Target-Aware Analysis of Data Augmentation for Hate Speech Detection [3.858155067958448]
Hate speech is one of the main threats posed by the widespread use of social networks. We investigate the possibility of augmenting existing data with generative language models, reducing target imbalance. For some hate categories such as origin, religion, and disability, hate speech classification using augmented data for training improves by more than 10% F1 over the no augmentation baseline.
arXiv Detail & Related papers (2024-10-10T15:46:27Z)
Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets [0.6918368994425961]
We leverage an extensive dataset with rich socio-demographic information of both annotators and targets.<n>Our analysis surfaces the presence of widespread biases, which we quantitatively describe and characterize based on their intensity and prevalence.<n>Our work offers new and nuanced results on human biases in hate speech annotations, as well as fresh insights into the design of AI-driven hate speech detection systems.
arXiv Detail & Related papers (2024-10-10T14:48:57Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model for online comments [2.162419921663162]
We propose a novel end-to-end model, HCDIR, for Hate Context Detection, and Hate Intensity Reduction in social media posts. We fine-tuned several pre-trained language models to detect hateful comments to ascertain the best-performing hateful comments detection model.
arXiv Detail & Related papers (2023-12-20T17:05:46Z)
Developing Linguistic Patterns to Mitigate Inherent Human Bias in Offensive Language Detection [1.6574413179773761]
We propose a linguistic data augmentation approach to reduce bias in labeling processes. This approach has the potential to improve offensive language classification tasks across multiple languages.
arXiv Detail & Related papers (2023-12-04T10:20:36Z)
Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection [23.97444551607624]
Hate speech in social media is a growing phenomenon, and detecting such toxic content has gained significant traction. HateMAML is a model-agnostic meta-learning-based framework that effectively performs hate speech detection in low-resource languages. Extensive experiments are conducted on five datasets across eight different low-resource languages.
arXiv Detail & Related papers (2023-03-04T22:28:29Z)
Testing the Ability of Language Models to Interpret Figurative Language [69.59943454934799]
Figurative and metaphorical language are commonplace in discourse. It remains an open question to what extent modern language models can interpret nonliteral phrases. We introduce Fig-QA, a Winograd-style nonliteral language understanding task.
arXiv Detail & Related papers (2022-04-26T23:42:22Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing. Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
Cross-lingual hate speech detection based on multilingual domain-specific word embeddings [4.769747792846004]
We propose to address the problem of multilingual hate speech detection from the perspective of transfer learning. Our goal is to determine if knowledge from one particular language can be used to classify other language. We show that the use of our simple yet specific multilingual hate representations improves classification results.
arXiv Detail & Related papers (2021-04-30T02:24:50Z)
It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations [68.16751625956243]
Only perfect Standard English corpora predisposes neural networks to discriminate against minorities from non-standard linguistic backgrounds. We perturb the inflectional morphology of words to craft plausible and semantically similar adversarial examples.
arXiv Detail & Related papers (2020-05-09T04:01:43Z)
A Framework for the Computational Linguistic Analysis of Dehumanization [52.735780962665814]
We analyze discussions of LGBTQ people in the New York Times from 1986 to 2015. We find increasingly humanizing descriptions of LGBTQ people over time. The ability to analyze dehumanizing language at a large scale has implications for automatically detecting and understanding media bias as well as abusive language online.
arXiv Detail & Related papers (2020-03-06T03:02:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.