Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency,
Syntax, and Semantics
- URL: http://arxiv.org/abs/2206.03390v1
- Date: Tue, 7 Jun 2022 15:35:10 GMT
- Title: Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency,
Syntax, and Semantics
- Authors: Aylin Caliskan, Pimparkar Parth Ajay, Tessa Charlesworth, Robert
Wolfe, Mahzarin R. Banaji
- Abstract summary: We provide a comprehensive analysis of group-based biases in widely-used static English word embeddings trained on internet corpora.
Using the Single-Category Word Embedding Association Test, we demonstrate the widespread prevalence of gender biases.
We find that, of the 1,000 most frequent words in the vocabulary, 77% are more associated with men than women.
- Score: 3.4048739113355215
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The statistical regularities in language corpora encode well-known social
biases into word embeddings. Here, we focus on gender to provide a
comprehensive analysis of group-based biases in widely-used static English word
embeddings trained on internet corpora (GloVe 2014, fastText 2017). Using the
Single-Category Word Embedding Association Test, we demonstrate the widespread
prevalence of gender biases that also show differences in: (1) frequencies of
words associated with men versus women; (b) part-of-speech tags in
gender-associated words; (c) semantic categories in gender-associated words;
and (d) valence, arousal, and dominance in gender-associated words.
First, in terms of word frequency: we find that, of the 1,000 most frequent
words in the vocabulary, 77% are more associated with men than women, providing
direct evidence of a masculine default in the everyday language of the
English-speaking world. Second, turning to parts-of-speech: the top
male-associated words are typically verbs (e.g., fight, overpower) while the
top female-associated words are typically adjectives and adverbs (e.g., giving,
emotionally). Gender biases in embeddings also permeate parts-of-speech. Third,
for semantic categories: bottom-up, cluster analyses of the top 1,000 words
associated with each gender. The top male-associated concepts include roles and
domains of big tech, engineering, religion, sports, and violence; in contrast,
the top female-associated concepts are less focused on roles, including,
instead, female-specific slurs and sexual content, as well as appearance and
kitchen terms. Fourth, using human ratings of word valence, arousal, and
dominance from a ~20,000 word lexicon, we find that male-associated words are
higher on arousal and dominance, while female-associated words are higher on
valence.
Related papers
- Beats of Bias: Analyzing Lyrics with Topic Modeling and Gender Bias Measurements [1.5379084885764847]
This paper uses topic modeling and bias measurement techniques to analyze and determine gender bias in English song lyrics.
We observe large amounts of profanity and misogynistic lyrics on various topics, especially in the overall biggest cluster.
We find that words related to intelligence and strength tend to show a male bias across genres, as opposed to appearance and weakness words, which are more female-biased.
arXiv Detail & Related papers (2024-09-24T10:24:53Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages [51.0349882045866]
This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender.
We prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender.
We find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability.
arXiv Detail & Related papers (2024-07-12T22:10:16Z) - The Causal Influence of Grammatical Gender on Distributional Semantics [87.8027818528463]
How much meaning influences gender assignment across languages is an active area of research in linguistics and cognitive science.
We offer a novel, causal graphical model that jointly represents the interactions between a noun's grammatical gender, its meaning, and adjective choice.
When we control for the meaning of the noun, the relationship between grammatical gender and adjective choice is near zero and insignificant.
arXiv Detail & Related papers (2023-11-30T13:58:13Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - The Undesirable Dependence on Frequency of Gender Bias Metrics Based on
Word Embeddings [0.0]
We study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods.
We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words.
This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations.
arXiv Detail & Related papers (2023-01-02T18:27:10Z) - Analysis of Male and Female Speakers' Word Choices in Public Speeches [0.0]
We compared the word choices of male and female presenters in public addresses such as TED lectures.
Based on our data, we determined that male speakers use specific types of linguistic, psychological, cognitive, and social words in considerably greater frequency than female speakers.
arXiv Detail & Related papers (2022-11-11T17:30:28Z) - Measuring Gender Bias in Word Embeddings of Gendered Languages Requires
Disentangling Grammatical Gender Signals [3.0349733976070015]
We demonstrate that word embeddings learn the association between a noun and its grammatical gender in grammatically gendered languages.
We show that disentangling grammatical gender signals from word embeddings may lead to improvement in semantic machine learning tasks.
arXiv Detail & Related papers (2022-06-03T17:11:00Z) - Gender Bias Hidden Behind Chinese Word Embeddings: The Case of Chinese
Adjectives [0.0]
This paper investigates gender bias in static word embeddings from a unique perspective, Chinese adjectives.
Through a comparison between the produced results and a human-scored data set, we demonstrate how gender bias encoded in word embeddings differentiates from people's attitudes.
arXiv Detail & Related papers (2021-06-01T02:12:45Z) - On the Relationships Between the Grammatical Genders of Inanimate Nouns
and Their Co-Occurring Adjectives and Verbs [57.015586483981885]
We use large-scale corpora in six different gendered languages.
We find statistically significant relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, indirect objects, and as subjects.
arXiv Detail & Related papers (2020-05-03T22:49:44Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.