Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling
- URL: http://arxiv.org/abs/2311.18711v3
- Date: Mon, 30 Sep 2024 20:34:19 GMT
- Title: Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling
- Authors: Matúš Pikuliak, Andrea Hrckova, Stefan Oresko, Marián Šimko,
- Abstract summary: GEST is a new dataset designed to measure gender-stereotypical reasoning in language models and machine translation systems.
GEST contains samples for 16 gender stereotypes about men and women compatible with the English language and 9 Slavic languages.
We used GEST to evaluate English and Slavic masked LMs, English generative LMs, and machine translation systems.
- Score: 0.3374875022248866
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present GEST -- a new manually created dataset designed to measure gender-stereotypical reasoning in language models and machine translation systems. GEST contains samples for 16 gender stereotypes about men and women (e.g., Women are beautiful, Men are leaders) that are compatible with the English language and 9 Slavic languages. The definition of said stereotypes was informed by gender experts. We used GEST to evaluate English and Slavic masked LMs, English generative LMs, and machine translation systems. We discovered significant and consistent amounts of gender-stereotypical reasoning in almost all the evaluated models and languages. Our experiments confirm the previously postulated hypothesis that the larger the model, the more stereotypical it usually is.
Related papers
- EuroGEST: Investigating gender stereotypes in multilingual language models [53.88459905621724]
Large language models increasingly support multiple languages, yet most benchmarks for gender bias remain English-centric.<n>We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages.
arXiv Detail & Related papers (2025-06-04T11:58:18Z) - Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation [4.881426374773398]
We propose a novel evaluation metric called Minimal Pair Accuracy (MPA)<n>MPA focuses on whether models adapt to gender cues in minimal pairs.<n>MPA shows that in anti-stereotypical cases, NMT models tend to more consistently take masculine gender cues into account.
arXiv Detail & Related papers (2025-05-13T13:17:23Z) - Are All Spanish Doctors Male? Evaluating Gender Bias in German Machine Translation [0.0]
WinoMTDE is a new gender bias evaluation test set designed to assess occupational stereotyping and underrepresentation in German machine translation systems.
The dataset comprises 288 German sentences that are balanced in regard to gender, as well as stereotype, which was annotated using German labor statistics.
arXiv Detail & Related papers (2025-02-26T12:46:59Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models [9.734705470760511]
We use GlobalBias to study a broad set of stereotypes from around the world.
We generate character profiles based on given names and evaluate the prevalence of stereotypes in model outputs.
arXiv Detail & Related papers (2024-07-09T14:52:52Z) - Building Bridges: A Dataset for Evaluating Gender-Fair Machine Translation into German [17.924716793621627]
We study gender-fair language in English-to-German machine translation (MT)
We conduct the first benchmark study involving two commercial systems and six neural MT models.
Our findings show that most systems produce mainly masculine forms and rarely gender-neutral variants.
arXiv Detail & Related papers (2024-06-10T09:39:19Z) - Are Models Biased on Text without Gender-related Language? [14.931375031931386]
We introduce UnStereoEval (USE), a novel framework for investigating gender bias in stereotype-free scenarios.
USE defines a sentence-level score based on pretraining data statistics to determine if the sentence contain minimal word-gender associations.
We find low fairness across all 28 tested models, suggesting that bias does not solely stem from the presence of gender-related words.
arXiv Detail & Related papers (2024-05-01T15:51:15Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Probing Explicit and Implicit Gender Bias through LLM Conditional Text
Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses.
We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z) - Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender
Perturbation over Fairytale Texts [87.62403265382734]
Recent studies show that traditional fairytales are rife with harmful gender biases.
This work aims to assess learned biases of language models by evaluating their robustness against gender perturbations.
arXiv Detail & Related papers (2023-10-16T22:25:09Z) - Measuring Gender Bias in West Slavic Language Models [41.49834421110596]
We introduce the first template-based dataset in Czech, Polish, and Slovak for measuring gender bias towards male, female and non-binary subjects.
We measure gender bias encoded in West Slavic language models by quantifying the toxicity and genderness of the generated words.
We find that these language models produce hurtful completions that depend on the subject's gender.
arXiv Detail & Related papers (2023-04-12T11:49:43Z) - Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution
and Machine Translation [10.542861450223128]
We find grammatical patterns indicating stereotypical and non-stereotypical gender-role assignments in corpora from three domains.
We manually verify the quality of our corpus and use it to evaluate gender bias in various coreference resolution and machine translation models.
arXiv Detail & Related papers (2021-09-08T18:14:11Z) - Quantifying Gender Bias Towards Politicians in Cross-Lingual Language
Models [104.41668491794974]
We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender.
We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians.
arXiv Detail & Related papers (2021-04-15T15:03:26Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.