Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets
- URL: http://arxiv.org/abs/2410.07991v3
- Date: Sun, 20 Oct 2024 08:13:18 GMT
- Title: Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets
- Authors: Tommaso Giorgi, Lorenzo Cima, Tiziano Fagni, Marco Avvenuti, Stefano Cresci,
- Abstract summary: We leverage an extensive dataset with rich socio-demographic information of both annotators and targets.
Our analysis surfaces the presence of widespread biases, which we quantitatively describe and characterize based on their intensity and prevalence.
Our work offers new and nuanced results on human biases in hate speech annotations, as well as fresh insights into the design of AI-driven hate speech detection systems.
- Score: 0.6918368994425961
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise of online platforms exacerbated the spread of hate speech, demanding scalable and effective detection. However, the accuracy of hate speech detection systems heavily relies on human-labeled data, which is inherently susceptible to biases. While previous work has examined the issue, the interplay between the characteristics of the annotator and those of the target of the hate are still unexplored. We fill this gap by leveraging an extensive dataset with rich socio-demographic information of both annotators and targets, uncovering how human biases manifest in relation to the target's attributes. Our analysis surfaces the presence of widespread biases, which we quantitatively describe and characterize based on their intensity and prevalence, revealing marked differences. Furthermore, we compare human biases with those exhibited by persona-based LLMs. Our findings indicate that while persona-based LLMs do exhibit biases, these differ significantly from those of human annotators. Overall, our work offers new and nuanced results on human biases in hate speech annotations, as well as fresh insights into the design of AI-driven hate speech detection systems.
Related papers
- A Target-Aware Analysis of Data Augmentation for Hate Speech Detection [3.858155067958448]
Hate speech is one of the main threats posed by the widespread use of social networks.
We investigate the possibility of augmenting existing data with generative language models, reducing target imbalance.
For some hate categories such as origin, religion, and disability, hate speech classification using augmented data for training improves by more than 10% F1 over the no augmentation baseline.
arXiv Detail & Related papers (2024-10-10T15:46:27Z) - Causal Micro-Narratives [62.47217054314046]
We present a novel approach to classify causal micro-narratives from text.
These narratives are sentence-level explanations of the cause(s) and/or effect(s) of a target subject.
arXiv Detail & Related papers (2024-10-07T17:55:10Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - HateDebias: On the Diversity and Variability of Hate Speech Debiasing [14.225997610785354]
We propose a benchmark, named HateDebias, to analyze the model ability of hate speech detection under continuous, changing environments.
Specifically, to meet the diversity of biases, we collect existing hate speech detection datasets with different types of biases.
We evaluate the detection accuracy of models trained on the datasets with a single type of bias with the performance on the HateDebias, where a significant performance drop is observed.
arXiv Detail & Related papers (2024-06-07T12:18:02Z) - Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering
Dehumanizing Language [11.946719280041789]
This paper evaluates the performance of cutting-edge NLP models, including GPT-4, GPT-3.5, and LLAMA-2 in identifying dehumanizing language.
Our findings reveal that while these models demonstrate potential, achieving a 70% accuracy rate in distinguishing dehumanizing language from broader hate speech, they also display biases.
arXiv Detail & Related papers (2024-02-21T13:57:36Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Latent Hatred: A Benchmark for Understanding Implicit Hate Speech [22.420275418616242]
This work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message.
We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech.
arXiv Detail & Related papers (2021-09-11T16:52:56Z) - Statistical Analysis of Perspective Scores on Hate Speech Detection [7.447951461558536]
State-of-the-art hate speech classifiers are efficient only when tested on the data with the same feature distribution as training data.
In such a diverse data distribution relying on low level features is the main cause of deficiency due to natural bias in data.
We show that, different hate speech datasets are very similar when it comes to extract their Perspective Scores.
arXiv Detail & Related papers (2021-06-22T17:17:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.