Dialect prejudice predicts AI decisions about people's character,
employability, and criminality
- URL: http://arxiv.org/abs/2403.00742v1
- Date: Fri, 1 Mar 2024 18:43:09 GMT
- Title: Dialect prejudice predicts AI decisions about people's character,
employability, and criminality
- Authors: Valentin Hofmann, Pratyusha Ria Kalluri, Dan Jurafsky, Sharese King
- Abstract summary: We show that language models embody covert racism in the form of dialect prejudice.
Our findings have far-reaching implications for the fair and safe employment of language technology.
- Score: 36.448157493217344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hundreds of millions of people now interact with language models, with uses
ranging from serving as a writing aid to informing hiring decisions. Yet these
language models are known to perpetuate systematic racial prejudices, making
their judgments biased in problematic ways about groups like African Americans.
While prior research has focused on overt racism in language models, social
scientists have argued that racism with a more subtle character has developed
over time. It is unknown whether this covert racism manifests in language
models. Here, we demonstrate that language models embody covert racism in the
form of dialect prejudice: we extend research showing that Americans hold
raciolinguistic stereotypes about speakers of African American English and find
that language models have the same prejudice, exhibiting covert stereotypes
that are more negative than any human stereotypes about African Americans ever
experimentally recorded, although closest to the ones from before the civil
rights movement. By contrast, the language models' overt stereotypes about
African Americans are much more positive. We demonstrate that dialect prejudice
has the potential for harmful consequences by asking language models to make
hypothetical decisions about people, based only on how they speak. Language
models are more likely to suggest that speakers of African American English be
assigned less prestigious jobs, be convicted of crimes, and be sentenced to
death. Finally, we show that existing methods for alleviating racial bias in
language models such as human feedback training do not mitigate the dialect
prejudice, but can exacerbate the discrepancy between covert and overt
stereotypes, by teaching language models to superficially conceal the racism
that they maintain on a deeper level. Our findings have far-reaching
implications for the fair and safe employment of language technology.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - Machines Do See Color: A Guideline to Classify Different Forms of Racist
Discourse in Large Corpora [0.0]
Current methods to identify and classify racist language in text rely on small-n qualitative approaches or large-n approaches focusing exclusively on overt forms of racist discourse.
This article provides a step-by-step generalizable guideline to identify and classify different forms of racist discourse in large corpora.
arXiv Detail & Related papers (2024-01-17T16:57:18Z) - Task-Agnostic Low-Rank Adapters for Unseen English Dialects [52.88554155235167]
Large Language Models (LLMs) are trained on corpora disproportionally weighted in favor of Standard American English.
By disentangling dialect-specific and cross-dialectal information, HyperLoRA improves generalization to unseen dialects in a task-agnostic fashion.
arXiv Detail & Related papers (2023-11-02T01:17:29Z) - Pre-trained Speech Processing Models Contain Human-Like Biases that
Propagate to Speech Emotion Recognition [4.4212441764241]
We present the Speech Embedding Association Test (SpEAT), a method for detecting bias in one type of model used for many speech tasks: pre-trained models.
Using the SpEAT, we test for six types of bias in 16 English speech models.
Our work provides evidence that, like text and image-based models, pre-trained speech based-models frequently learn human-like biases.
arXiv Detail & Related papers (2023-10-29T02:27:56Z) - Annotators with Attitudes: How Annotator Beliefs And Identities Bias
Toxic Language Detection [75.54119209776894]
We investigate the effect of annotator identities (who) and beliefs (why) on toxic language annotations.
We consider posts with three characteristics: anti-Black language, African American English dialect, and vulgarity.
Our results show strong associations between annotator identity and beliefs and their ratings of toxicity.
arXiv Detail & Related papers (2021-11-15T18:58:20Z) - Reducing Unintended Identity Bias in Russian Hate Speech Detection [0.21485350418225244]
This paper describes our efforts towards classifying hate speech in Russian.
We propose simple techniques of reducing unintended bias, such as generating training data with language models using terms and words related to protected identities as context.
arXiv Detail & Related papers (2020-10-22T12:54:14Z) - CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked
Language Models [30.582132471411263]
We introduce the Crowd Stereotype Pairs benchmark (CrowS-Pairs)
CrowS-Pairs has 1508 examples that cover stereotypes dealing with nine types of bias, like race, religion, and age.
We find that all three of the widely-used sentences we evaluate substantially favor stereotypes in every category in CrowS-Pairs.
arXiv Detail & Related papers (2020-09-30T22:38:40Z) - It's Morphin' Time! Combating Linguistic Discrimination with
Inflectional Perturbations [68.16751625956243]
Only perfect Standard English corpora predisposes neural networks to discriminate against minorities from non-standard linguistic backgrounds.
We perturb the inflectional morphology of words to craft plausible and semantically similar adversarial examples.
arXiv Detail & Related papers (2020-05-09T04:01:43Z) - StereoSet: Measuring stereotypical bias in pretrained language models [24.020149562072127]
We present StereoSet, a large-scale natural dataset in English to measure stereotypical biases in four domains.
We evaluate popular models like BERT, GPT-2, RoBERTa, and XLNet on our dataset and show that these models exhibit strong stereotypical biases.
arXiv Detail & Related papers (2020-04-20T17:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.