Measuring Gender Bias in Word Embeddings of Gendered Languages Requires
Disentangling Grammatical Gender Signals
- URL: http://arxiv.org/abs/2206.01691v1
- Date: Fri, 3 Jun 2022 17:11:00 GMT
- Title: Measuring Gender Bias in Word Embeddings of Gendered Languages Requires
Disentangling Grammatical Gender Signals
- Authors: Shiva Omrani Sabbaghi, Aylin Caliskan
- Abstract summary: We demonstrate that word embeddings learn the association between a noun and its grammatical gender in grammatically gendered languages.
We show that disentangling grammatical gender signals from word embeddings may lead to improvement in semantic machine learning tasks.
- Score: 3.0349733976070015
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Does the grammatical gender of a language interfere when measuring the
semantic gender information captured by its word embeddings? A number of
anomalous gender bias measurements in the embeddings of gendered languages
suggest this possibility. We demonstrate that word embeddings learn the
association between a noun and its grammatical gender in grammatically gendered
languages, which can skew social gender bias measurements. Consequently, word
embedding post-processing methods are introduced to quantify, disentangle, and
evaluate grammatical gender signals. The evaluation is performed on five
gendered languages from the Germanic, Romance, and Slavic branches of the
Indo-European language family. Our method reduces the strength of grammatical
gender signals, which is measured in terms of effect size (Cohen's d), by a
significant average of d = 1.3 for French, German, and Italian, and d = 0.56
for Polish and Spanish. Once grammatical gender is disentangled, the
association between over 90% of 10,000 inanimate nouns and their assigned
grammatical gender weakens, and cross-lingual bias results from the Word
Embedding Association Test (WEAT) become more congruent with country-level
implicit bias measurements. The results further suggest that disentangling
grammatical gender signals from word embeddings may lead to improvement in
semantic machine learning tasks.
Related papers
- Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages [51.0349882045866]
This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender.
We prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender.
We find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability.
arXiv Detail & Related papers (2024-07-12T22:10:16Z) - The Causal Influence of Grammatical Gender on Distributional Semantics [87.8027818528463]
How much meaning influences gender assignment across languages is an active area of research in linguistics and cognitive science.
We offer a novel, causal graphical model that jointly represents the interactions between a noun's grammatical gender, its meaning, and adjective choice.
When we control for the meaning of the noun, the relationship between grammatical gender and adjective choice is near zero and insignificant.
arXiv Detail & Related papers (2023-11-30T13:58:13Z) - Don't Overlook the Grammatical Gender: Bias Evaluation for Hindi-English
Machine Translation [0.0]
Existing evaluation benchmarks primarily focus on English as the source language of translation.
For source languages other than English, studies often employ gender-neutral sentences for bias evaluation.
We emphasise the significance of tailoring bias evaluation test sets to account for grammatical gender markers in the source language.
arXiv Detail & Related papers (2023-11-11T09:28:43Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Don't Forget About Pronouns: Removing Gender Bias in Language Models
Without Losing Factual Gender Information [4.391102490444539]
We focus on two types of such signals in English texts: factual gender information and gender bias.
We aim to diminish the stereotypical bias in the representations while preserving the factual gender signal.
arXiv Detail & Related papers (2022-06-21T21:38:25Z) - Mitigating Gender Stereotypes in Hindi and Marathi [1.2891210250935146]
This paper evaluates the gender stereotypes in Hindi and Marathi languages.
We create a dataset of neutral and gendered occupation words, emotion words and measure bias with the help of Embedding Coherence Test (ECT) and Relative Norm Distance (RND)
Experiments show that our proposed debiasing techniques reduce gender bias in these languages.
arXiv Detail & Related papers (2022-05-12T06:46:53Z) - Quantifying Gender Bias Towards Politicians in Cross-Lingual Language
Models [104.41668491794974]
We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender.
We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians.
arXiv Detail & Related papers (2021-04-15T15:03:26Z) - Neural Machine Translation Doesn't Translate Gender Coreference Right
Unless You Make It [18.148675498274866]
We propose schemes for incorporating explicit word-level gender inflection tags into Neural Machine Translation.
We find that simple existing approaches can over-generalize a gender-feature to multiple entities in a sentence.
We also propose an extension to assess translations of gender-neutral entities from English given a corresponding linguistic convention.
arXiv Detail & Related papers (2020-10-11T20:05:42Z) - An exploration of the encoding of grammatical gender in word embeddings [0.6461556265872973]
The study of grammatical gender based on word embeddings can give insight into discussions on how grammatical genders are determined.
It is found that there is an overlap in how grammatical gender is encoded in Swedish, Danish, and Dutch embeddings.
arXiv Detail & Related papers (2020-08-05T06:01:46Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.