EuroGEST: Investigating gender stereotypes in multilingual language models
- URL: http://arxiv.org/abs/2506.03867v1
- Date: Wed, 04 Jun 2025 11:58:18 GMT
- Title: EuroGEST: Investigating gender stereotypes in multilingual language models
- Authors: Jacqueline Rowe, Mateusz Klimaszewski, Liane Guillou, Shannon Vallor, Alexandra Birch,
- Abstract summary: Large language models increasingly support multiple languages, yet most benchmarks for gender bias remain English-centric.<n>We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages.
- Score: 53.88459905621724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models increasingly support multiple languages, yet most benchmarks for gender bias remain English-centric. We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages. EuroGEST builds on an existing expert-informed benchmark covering 16 gender stereotypes, expanded in this work using translation tools, quality estimation metrics, and morphological heuristics. Human evaluations confirm that our data generation method results in high accuracy of both translations and gender labels across languages. We use EuroGEST to evaluate 24 multilingual language models from six model families, demonstrating that the strongest stereotypes in all models across all languages are that women are \textit{beautiful,} \textit{empathetic} and \textit{neat} and men are \textit{leaders}, \textit{strong, tough} and \textit{professional}. We also show that larger models encode gendered stereotypes more strongly and that instruction finetuning does not consistently reduce gendered stereotypes. Our work highlights the need for more multilingual studies of fairness in LLMs and offers scalable methods and resources to audit gender bias across languages.
Related papers
- Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models [14.632649933582648]
We present the first systematic audit of three public multilingual CLIP checkpoints -- M-CLIP, NLLB-CLIP, and CAPIVARA-CLIP.<n>We quantify race and gender bias and measure stereotype amplification.<n> Highly gendered languages consistently magnify all measured bias types, but even gender-neutral languages remain vulnerable when cross-lingual weight sharing imports foreign stereotypes.
arXiv Detail & Related papers (2025-05-20T10:14:00Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora [9.959039325564744]
Gender bias in text corpora can lead to perpetuation and amplification of societal inequalities.
Existing methods to measure gender representation bias in text corpora have mainly been proposed for English.
This paper introduces a novel methodology to quantitatively measure gender representation bias in Spanish corpora.
arXiv Detail & Related papers (2024-06-19T16:30:58Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.<n>We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.<n>Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - UnMASKed: Quantifying Gender Biases in Masked Language Models through
Linguistically Informed Job Market Prompts [0.0]
This research delves into the inherent biases present in masked language models (MLMs)
This study evaluated six prominent models: BERT, RoBERTa, DistilBERT, BERT-multilingual, XLM-RoBERTa, and DistilBERT-multilingual.
The analysis reveals stereotypical gender alignment of all models, with multilingual variants showing comparatively reduced biases.
arXiv Detail & Related papers (2024-01-28T23:00:40Z) - Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling [0.3374875022248866]
GEST is a new dataset designed to measure gender-stereotypical reasoning in language models and machine translation systems.
GEST contains samples for 16 gender stereotypes about men and women compatible with the English language and 9 Slavic languages.
We used GEST to evaluate English and Slavic masked LMs, English generative LMs, and machine translation systems.
arXiv Detail & Related papers (2023-11-30T17:06:00Z) - Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in
Multilingual Machine Translation [28.471506840241602]
Gender bias is a significant issue in machine translation, leading to ongoing research efforts in developing bias mitigation techniques.
We propose a bias mitigation method based on a novel approach.
Gender-Aware Contrastive Learning, GACL, encodes contextual gender information into the representations of non-explicit gender words.
arXiv Detail & Related papers (2023-05-23T12:53:39Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - Quantifying Gender Bias Towards Politicians in Cross-Lingual Language
Models [104.41668491794974]
We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender.
We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians.
arXiv Detail & Related papers (2021-04-15T15:03:26Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.