Related papers: Who Speaks Matters: Analysing the Influence of the Speaker's Ethnicity on Hate Classification

Who Speaks Matters: Analysing the Influence of the Speaker's Ethnicity on Hate Classification

URL: http://arxiv.org/abs/2410.20490v2
Date: Sun, 12 Oct 2025 17:25:32 GMT
Title: Who Speaks Matters: Analysing the Influence of the Speaker's Ethnicity on Hate Classification
Authors: Ananya Malik, Kartik Sharma, Shaily Bhatt, Lynnette Hui Xian Ng,
Abstract summary: Large Language Models (LLMs) offer a lucrative promise for scalable content moderation, including hate speech detection.<n>They are also known to be brittle and biased against marginalised communities and dialects.<n>This requires their applications to high-stakes tasks like hate speech detection to be critically scrutinized.
Score: 7.014381169870851
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) offer a lucrative promise for scalable content moderation, including hate speech detection. However, they are also known to be brittle and biased against marginalised communities and dialects. This requires their applications to high-stakes tasks like hate speech detection to be critically scrutinized. In this work, we investigate the robustness of hate speech classification using LLMs particularly when explicit and implicit markers of the speaker's ethnicity are injected into the input. For explicit markers, we inject a phrase that mentions the speaker's linguistic identity. For the implicit markers, we inject dialectal features. By analysing how frequently model outputs flip in the presence of these markers, we reveal varying degrees of brittleness across 3 LLMs and 1 LM and 5 linguistic identities. We find that the presence of implicit dialect markers in inputs causes model outputs to flip more than the presence of explicit markers. Further, the percentage of flips varies across ethnicities. Finally, we find that larger models are more robust. Our findings indicate the need for exercising caution in deploying LLMs for high-stakes tasks like hate speech detection.

Related papers

Are Stereotypes Leading LLMs' Zero-Shot Stance Detection ? [4.861653297482551]
Large Language Models inherit stereotypes from their pretraining data, leading to biased behavior toward certain social groups.<n>In this paper, we focus on the bias of Large Language Models when performing stance detection in a zero-shot setting.<n>We show that LLMs exhibit significant stereotypes in stance detection tasks, such as incorrectly associating pro-marijuana views with low text complexity and African American dialect with opposition to Donald Trump.
arXiv Detail & Related papers (2025-10-23T03:05:25Z)
Causality Guided Representation Learning for Cross-Style Hate Speech Detection [11.028139269410685]
The proliferation of online hate speech poses a significant threat to the harmony of the web.<n>Existing hate speech detection models fail to generalize effectively across diverse stylistic variations.<n>We propose CADET, a causal representation learning framework that disentangles hate speech into interpretable latent factors.
arXiv Detail & Related papers (2025-10-09T02:41:37Z)
Bridging the Gap: In-Context Learning for Modeling Human Disagreement [8.011316959982654]
Large Language Models (LLMs) have shown strong performance on NLP classification tasks.<n>This study examines whether LLMs can capture multiple perspectives and reflect annotator disagreement in subjective tasks such as hate speech and offensive language detection.
arXiv Detail & Related papers (2025-06-06T14:24:29Z)
Selective Demonstration Retrieval for Improved Implicit Hate Speech Detection [4.438698005789677]
Hate speech detection is a crucial area of research in natural language processing, essential for ensuring online community safety. Unlike explicit hate speech, implicit expressions often depend on context, cultural subtleties, and hidden biases. Large Language Models often show heightened sensitivity to toxic language and references to vulnerable groups, which can lead to misclassifications. We propose a novel method, which utilizes in-context learning without requiring model fine-tuning.
arXiv Detail & Related papers (2025-04-16T13:43:23Z)
ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models [75.05436691700572]
We introduce ExpliCa, a new dataset for evaluating Large Language Models (LLMs) in explicit causal reasoning.<n>We tested seven commercial and open-source LLMs on ExpliCa through prompting and perplexity-based metrics.<n>Surprisingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events.
arXiv Detail & Related papers (2025-02-21T14:23:14Z)
Exploring Large Language Models for Hate Speech Detection in Rioplatense Spanish [0.08192907805418582]
Hate speech detection deals with many language variants, slang, slurs, expression modalities, and cultural nuances. This work presents a brief analysis of the performance of large language models in the detection of Hate Speech for Rioplatense Spanish.
arXiv Detail & Related papers (2024-10-16T02:32:12Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales [15.458557611029518]
Social media platforms are a prominent arena for users to engage in interpersonal discussions and express opinions. There arises a need to automatically identify and flag instances of hate speech. We propose to use state-of-the-art Large Language Models (LLMs) to extract features in the form of rationales from the input text.
arXiv Detail & Related papers (2024-03-19T03:22:35Z)
Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection [4.653571633477755]
Large language models (LLMs) excel in many diverse applications beyond language generation, e.g., translation, summarization, and sentiment analysis. This becomes pertinent in the realm of identifying hateful or toxic speech -- a domain fraught with challenges and ethical dilemmas.
arXiv Detail & Related papers (2024-03-12T19:12:28Z)
Don't Go To Extremes: Revealing the Excessive Sensitivity and Calibration Limitations of LLMs in Implicit Hate Speech Detection [29.138463029748547]
This paper explores the capability of Large Language Models to detect implicit hate speech and express confidence in their responses. Our findings highlight that LLMs exhibit two extremes: (1) LLMs display excessive sensitivity towards groups or topics that may cause fairness issues, resulting in misclassifying benign statements as hate speech.
arXiv Detail & Related papers (2024-02-18T00:04:40Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence Affect Language Models [74.07684768317705]
LMs are highly sensitive to markers of certainty in prompts, with accuies varying more than 80%. We find that expressions of high certainty result in a decrease in accuracy as compared to low expressions; similarly, factive verbs hurt performance, while evidentials benefit performance. These associations may suggest that LMs is based on observed language use, rather than truly reflecting uncertainty.
arXiv Detail & Related papers (2023-02-26T23:46:29Z)
Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers [54.4919139401528]
We show that it is possible to reduce interference by identifying and pruning language-specific parameters. We show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction.
arXiv Detail & Related papers (2022-10-11T18:11:37Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech [22.420275418616242]
This work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message. We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech.
arXiv Detail & Related papers (2021-09-11T16:52:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.