Quantifying Gender Bias Towards Politicians in Cross-Lingual Language
Models
- URL: http://arxiv.org/abs/2104.07505v2
- Date: Thu, 9 Nov 2023 16:15:40 GMT
- Title: Quantifying Gender Bias Towards Politicians in Cross-Lingual Language
Models
- Authors: Karolina Sta\'nczak, Sagnik Ray Choudhury, Tiago Pimentel, Ryan
Cotterell, Isabelle Augenstein
- Abstract summary: We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender.
We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians.
- Score: 104.41668491794974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research has demonstrated that large pre-trained language models
reflect societal biases expressed in natural language. The present paper
introduces a simple method for probing language models to conduct a
multilingual study of gender bias towards politicians. We quantify the usage of
adjectives and verbs generated by language models surrounding the names of
politicians as a function of their gender. To this end, we curate a dataset of
250k politicians worldwide, including their names and gender. Our study is
conducted in seven languages across six different language modeling
architectures. The results demonstrate that pre-trained language models' stance
towards politicians varies strongly across analyzed languages. We find that
while some words such as dead, and designated are associated with both male and
female politicians, a few specific words such as beautiful and divorced are
predominantly associated with female politicians. Finally, and contrary to
previous findings, our study suggests that larger language models do not tend
to be significantly more gender-biased than smaller ones.
Related papers
- What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages [51.0349882045866]
This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender.
We prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender.
We find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability.
arXiv Detail & Related papers (2024-07-12T22:10:16Z) - How Gender Interacts with Political Values: A Case Study on Czech BERT Models [0.0]
This case study focuses on the political biases of pre-trained encoders in Czech.
Because Czech is a gendered language, we measure how the grammatical gender coincides with responses to men and women in the survey.
We find that the models do not assign statement probability following value-driven reasoning.
arXiv Detail & Related papers (2024-03-20T11:30:45Z) - Gender Bias in Large Language Models across Multiple Languages [10.068466432117113]
We examine gender bias in large language models (LLMs) generated for different languages.
We use three measurements: 1) gender bias in selecting descriptive words given the gender-related context.
2) gender bias in selecting gender-related pronouns (she/he) given the descriptive words.
arXiv Detail & Related papers (2024-03-01T04:47:16Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Measuring Gender Bias in West Slavic Language Models [41.49834421110596]
We introduce the first template-based dataset in Czech, Polish, and Slovak for measuring gender bias towards male, female and non-binary subjects.
We measure gender bias encoded in West Slavic language models by quantifying the toxicity and genderness of the generated words.
We find that these language models produce hurtful completions that depend on the subject's gender.
arXiv Detail & Related papers (2023-04-12T11:49:43Z) - Efficient Gender Debiasing of Pre-trained Indic Language Models [0.0]
The gender bias present in the data on which language models are pre-trained gets reflected in the systems that use these models.
In our paper, we measure gender bias associated with occupations in Hindi language models.
Our results reflect that the bias is reduced post-introduction of our proposed mitigation techniques.
arXiv Detail & Related papers (2022-09-08T09:15:58Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.