FairTranslate: An English-French Dataset for Gender Bias Evaluation in Machine Translation by Overcoming Gender Binarity
- URL: http://arxiv.org/abs/2504.15941v1
- Date: Tue, 22 Apr 2025 14:35:16 GMT
- Title: FairTranslate: An English-French Dataset for Gender Bias Evaluation in Machine Translation by Overcoming Gender Binarity
- Authors: Fanny Jourdan, Yannick Chevalier, Cécile Favre,
- Abstract summary: Large Language Models (LLMs) are increasingly leveraged for translation tasks but often fall short when translating inclusive language.<n>This paper presents a novel, fully human-annotated dataset designed to evaluate non-binary gender biases in machine translation systems from English to French.
- Score: 0.6827423171182154
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly leveraged for translation tasks but often fall short when translating inclusive language -- such as texts containing the singular 'they' pronoun or otherwise reflecting fair linguistic protocols. Because these challenges span both computational and societal domains, it is imperative to critically evaluate how well LLMs handle inclusive translation with a well-founded framework. This paper presents FairTranslate, a novel, fully human-annotated dataset designed to evaluate non-binary gender biases in machine translation systems from English to French. FairTranslate includes 2418 English-French sentence pairs related to occupations, annotated with rich metadata such as the stereotypical alignment of the occupation, grammatical gender indicator ambiguity, and the ground-truth gender label (male, female, or inclusive). We evaluate four leading LLMs (Gemma2-2B, Mistral-7B, Llama3.1-8B, Llama3.3-70B) on this dataset under different prompting procedures. Our results reveal substantial biases in gender representation across LLMs, highlighting persistent challenges in achieving equitable outcomes in machine translation. These findings underscore the need for focused strategies and interventions aimed at ensuring fair and inclusive language usage in LLM-based translation systems. We make the FairTranslate dataset publicly available on Hugging Face, and disclose the code for all experiments on GitHub.
Related papers
- The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification [57.06913662622832]
Gender-fair language fosters inclusion by addressing all genders or using neutral forms.
Gender-fair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns.
While we offer initial insights on the effect on German text classification, the findings likely apply to other languages.
arXiv Detail & Related papers (2024-09-26T15:08:17Z) - GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.<n>GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Fine-grained Gender Control in Machine Translation with Large Language Models [15.63784352130237]
We tackle controlled translation in a more realistic setting of inputs with multiple entities.
Our proposed method instructs the model with fine-grained entity-level gender information to translate with correct gender inflections.
We discover an emergence of gender interference phenomenon when controlling the gender of multiple entities.
arXiv Detail & Related papers (2024-07-21T13:15:00Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - Investigating Markers and Drivers of Gender Bias in Machine Translations [0.0]
Implicit gender bias in large language models (LLMs) is a well-documented problem.
We use the DeepL translation API to investigate the bias evinced when repeatedly translating a set of 56 Software Engineering tasks.
We find that some languages display similar patterns of pronoun use, falling into three loose groups.
We identify the main verb appearing in a sentence as a likely significant driver of implied gender in the translations.
arXiv Detail & Related papers (2024-03-18T15:54:46Z) - Probing Explicit and Implicit Gender Bias through LLM Conditional Text
Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses.
We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z) - A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for
Fairer Instruction-Tuned Machine Translation [35.44115368160656]
We investigate whether and to what extent machine translation models exhibit gender bias.
We find that IFT models default to male-inflected translations, even disregarding female occupational stereotypes.
We propose an easy-to-implement and effective bias mitigation solution.
arXiv Detail & Related papers (2023-10-18T17:36:55Z) - Gender-specific Machine Translation with Large Language Models [39.49764957694078]
Decoder-only Large Language Models (LLMs) have demonstrated potential in machine translation.
LLMs offer the ability to control the properties of the output through prompts.
Our results indicate that LLaMa can generate gender-specific translations with translation accuracy and gender bias comparable to state-of-the-art multilingual NMT system.
arXiv Detail & Related papers (2023-09-06T17:24:06Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.