Regional Negative Bias in Word Embeddings Predicts Racial Animus--but
only via Name Frequency
- URL: http://arxiv.org/abs/2201.08451v1
- Date: Thu, 20 Jan 2022 20:52:12 GMT
- Title: Regional Negative Bias in Word Embeddings Predicts Racial Animus--but
only via Name Frequency
- Authors: Austin van Loon, Salvatore Giorgi, Robb Willer, Johannes Eichstaedt
- Abstract summary: We show that anti-black WEAT estimates from geo-tagged social media data strongly correlate with several measures of racial animus.
We also show that every one of these correlations is explained by the frequency of Black names in the underlying corpora relative to White names.
- Score: 2.247786323899963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The word embedding association test (WEAT) is an important method for
measuring linguistic biases against social groups such as ethnic minorities in
large text corpora. It does so by comparing the semantic relatedness of words
prototypical of the groups (e.g., names unique to those groups) and attribute
words (e.g., 'pleasant' and 'unpleasant' words). We show that anti-black WEAT
estimates from geo-tagged social media data at the level of metropolitan
statistical areas strongly correlate with several measures of racial
animus--even when controlling for sociodemographic covariates. However, we also
show that every one of these correlations is explained by a third variable: the
frequency of Black names in the underlying corpora relative to White names.
This occurs because word embeddings tend to group positive (negative) words and
frequent (rare) words together in the estimated semantic space. As the
frequency of Black names on social media is strongly correlated with Black
Americans' prevalence in the population, this results in spurious anti-Black
WEAT estimates wherever few Black Americans live. This suggests that research
using the WEAT to measure bias should consider term frequency, and also
demonstrates the potential consequences of using black-box models like word
embeddings to study human cognition and behavior.
Related papers
- A Study of Nationality Bias in Names and Perplexity using Off-the-Shelf Affect-related Tweet Classifiers [0.0]
We create counterfactual examples with small perturbations on target-domain data instead of relying on templates or specific datasets for bias detection.
On widely used classifiers for subjectivity analysis, including sentiment, emotion, hate speech, our results demonstrate positive biases related to the language spoken in a country.
arXiv Detail & Related papers (2024-07-01T22:17:17Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - What's in a Name? Auditing Large Language Models for Race and Gender
Bias [49.28899492966893]
We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4.
We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women.
arXiv Detail & Related papers (2024-02-21T18:25:25Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - Identification of Biased Terms in News Articles by Comparison of
Outlet-specific Word Embeddings [9.379650501033465]
We train two word embedding models, one on texts of left-wing, the other on right-wing news outlets.
Our hypothesis is that a word's representations in both word embedding spaces are more similar for non-biased words than biased words.
This paper presents the first in-depth look at the context of bias words measured by word embeddings.
arXiv Detail & Related papers (2021-12-14T13:23:49Z) - Annotators with Attitudes: How Annotator Beliefs And Identities Bias
Toxic Language Detection [75.54119209776894]
We investigate the effect of annotator identities (who) and beliefs (why) on toxic language annotations.
We consider posts with three characteristics: anti-Black language, African American English dialect, and vulgarity.
Our results show strong associations between annotator identity and beliefs and their ratings of toxicity.
arXiv Detail & Related papers (2021-11-15T18:58:20Z) - Frequency-based Distortions in Contextualized Word Embeddings [29.88883761339757]
This work explores the geometric characteristics of contextualized word embeddings with two novel tools.
Words of high and low frequency differ significantly with respect to their representational geometry.
BERT-Base has more trouble differentiating between South American and African countries than North American and European ones.
arXiv Detail & Related papers (2021-04-17T06:35:48Z) - Detecting Emergent Intersectional Biases: Contextualized Word Embeddings
Contain a Distribution of Human-like Biases [10.713568409205077]
State-of-the-art neural language models generate dynamic word embeddings dependent on the context in which the word appears.
We introduce the Contextualized Embedding Association Test (CEAT), that can summarize the magnitude of overall bias in neural language models.
We develop two methods, Intersectional Bias Detection (IBD) and Emergent Intersectional Bias Detection (EIBD), to automatically identify the intersectional biases and emergent intersectional biases from static word embeddings.
arXiv Detail & Related papers (2020-06-06T19:49:50Z) - ValNorm Quantifies Semantics to Reveal Consistent Valence Biases Across
Languages and Over Centuries [3.0349733976070015]
Word embeddings learn implicit biases from linguistic regularities captured by word co-occurrence statistics.
By extending methods that quantify human-like biases in word embeddings, we introduceValNorm, a novel intrinsic evaluation task.
We apply ValNorm on static word embeddings from seven languages and historical English text spanning 200 years.
arXiv Detail & Related papers (2020-06-06T19:29:36Z) - It's Morphin' Time! Combating Linguistic Discrimination with
Inflectional Perturbations [68.16751625956243]
Only perfect Standard English corpora predisposes neural networks to discriminate against minorities from non-standard linguistic backgrounds.
We perturb the inflectional morphology of words to craft plausible and semantically similar adversarial examples.
arXiv Detail & Related papers (2020-05-09T04:01:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.