Hate Speech Classifiers Learn Human-Like Social Stereotypes
- URL: http://arxiv.org/abs/2110.14839v1
- Date: Thu, 28 Oct 2021 01:35:41 GMT
- Title: Hate Speech Classifiers Learn Human-Like Social Stereotypes
- Authors: Aida Mostafazadeh Davani, Mohammad Atari, Brendan Kennedy, Morteza
Dehghani
- Abstract summary: Social stereotypes negatively impact individuals' judgements about different groups.
Social stereotypes may have a critical role in how people understand language directed toward minority social groups.
- Score: 4.132204773132937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social stereotypes negatively impact individuals' judgements about different
groups and may have a critical role in how people understand language directed
toward minority social groups. Here, we assess the role of social stereotypes
in the automated detection of hateful language by examining the relation
between individual annotator biases and erroneous classification of texts by
hate speech classifiers. Specifically, in Study 1 we investigate the impact of
novice annotators' stereotypes on their hate-speech-annotation behavior. In
Study 2 we examine the effect of language-embedded stereotypes on expert
annotators' aggregated judgements in a large annotated corpus. Finally, in
Study 3 we demonstrate how language-embedded stereotypes are associated with
systematic prediction errors in a neural-network hate speech classifier. Our
results demonstrate that hate speech classifiers learn human-like biases which
can further perpetuate social inequalities when propagated at scale. This
framework, combining social psychological and computational linguistic methods,
provides insights into additional sources of bias in hate speech moderation,
informing ongoing debates regarding fairness in machine learning.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Quantifying Stereotypes in Language [6.697298321551588]
We quantify stereotypes in language by annotating a dataset.
We use the pre-trained language models (PLMs) to learn this dataset to predict stereotypes of sentences.
We discuss stereotypes about common social issues such as hate speech, sexism, sentiments, and disadvantaged and advantaged groups.
arXiv Detail & Related papers (2024-01-28T01:07:21Z) - Beyond Denouncing Hate: Strategies for Countering Implied Biases and
Stereotypes in Language [18.560379338032558]
We draw from psychology and philosophy literature to craft six psychologically inspired strategies to challenge the underlying stereotypical implications of hateful language.
We show that human-written counterspeech uses strategies that are more specific to the implied stereotype, whereas machine-generated counterspeech uses less specific strategies.
Our findings point to the importance of accounting for the underlying stereotypical implications of speech when generating counterspeech.
arXiv Detail & Related papers (2023-10-31T21:33:46Z) - Mitigating Bias in Conversations: A Hate Speech Classifier and Debiaser
with Prompts [0.6827423171182153]
We propose an approach that involves a two-step process: first, detecting hate speech using a classifier, and then utilizing a debiasing component that generates less biased or unbiased alternatives through prompts.
We evaluated our approach on a benchmark dataset and observed reduction in negativity due to hate speech comments.
arXiv Detail & Related papers (2023-07-14T13:33:28Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Whose Opinions Matter? Perspective-aware Models to Identify Opinions of
Hate Speech Victims in Abusive Language Detection [6.167830237917662]
We present an in-depth study to model polarized opinions coming from different communities.
We believe that by relying on this information, we can divide the annotators into groups sharing similar perspectives.
We propose a novel resource, a multi-perspective English language dataset annotated according to different sub-categories relevant for characterising online abuse.
arXiv Detail & Related papers (2021-06-30T08:35:49Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Interpretable Multi-Modal Hate Speech Detection [32.36781061930129]
We propose a deep neural multi-modal model that can effectively capture the semantics of the text along with socio-cultural context in which a particular hate expression is made.
Our model is able to outperform the existing state-of-the-art hate speech classification approaches.
arXiv Detail & Related papers (2021-03-02T10:12:26Z) - Towards Debiasing Sentence Representations [109.70181221796469]
We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks.
We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.
arXiv Detail & Related papers (2020-07-16T04:22:30Z) - Multilingual Twitter Corpus and Baselines for Evaluating Demographic
Bias in Hate Speech Recognition [46.57105755981092]
We publish a multilingual Twitter corpus for the task of hate speech detection.
The corpus covers five languages: English, Italian, Polish, Portuguese and Spanish.
We evaluate the inferred demographic labels with a crowdsourcing platform.
arXiv Detail & Related papers (2020-02-24T16:45:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.