Related papers: GFG -- Gender-Fair Generation: A CALAMITA Challenge

GFG -- Gender-Fair Generation: A CALAMITA Challenge

URL: http://arxiv.org/abs/2412.19168v2
Date: Mon, 30 Dec 2024 10:44:39 GMT
Title: GFG -- Gender-Fair Generation: A CALAMITA Challenge
Authors: Simona Frenda, Andrea Piergentili, Beatrice Savoldi, Marco Madeddu, Martina Rosola, Silvia Casola, Chiara Ferrando, Viviana Patti, Matteo Negri, Luisa Bentivogli,
Abstract summary: Gender-fair language aims at promoting gender equality by using terms and expressions that include all identities.<n>Gender-Fair Generation challenge intends to help shift toward gender-fair language in written communication.
Score: 15.399739689743935
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Gender-fair language aims at promoting gender equality by using terms and expressions that include all identities and avoid reinforcing gender stereotypes. Implementing gender-fair strategies is particularly challenging in heavily gender-marked languages, such as Italian. To address this, the Gender-Fair Generation challenge intends to help shift toward gender-fair language in written communication. The challenge, designed to assess and monitor the recognition and generation of gender-fair language in both mono- and cross-lingual scenarios, includes three tasks: (1) the detection of gendered expressions in Italian sentences, (2) the reformulation of gendered expressions into gender-fair alternatives, and (3) the generation of gender-fair language in automatic translation from English to Italian. The challenge relies on three different annotated datasets: the GFL-it corpus, which contains Italian texts extracted from administrative documents provided by the University of Brescia; GeNTE, a bilingual test set for gender-neutral rewriting and translation built upon a subset of the Europarl dataset; and Neo-GATE, a bilingual test set designed to assess the use of non-binary neomorphemes in Italian for both fair formulation and translation tasks. Finally, each task is evaluated with specific metrics: average of F1-score obtained by means of BERTScore computed on each entry of the datasets for task 1, an accuracy measured with a gender-neutral classifier, and a coverage-weighted accuracy for tasks 2 and 3.

Related papers

GAMBIT+: A Challenge Set for Evaluating Gender Bias in Machine Translation Quality Estimation Metrics [18.766033854102663]
Gender bias in machine translation (MT) systems has been extensively documented, but bias in automatic quality estimation (QE) metrics remains comparatively underexplored.<n>Existing studies suggest that QE metrics can also exhibit gender bias, yet most analyses are limited by small datasets, narrow occupational coverage, and restricted language variety.<n>We introduce a large-scale challenge set specifically designed to probe the behavior of QE metrics when evaluating translations containing gender-ambiguous occupational terms.
arXiv Detail & Related papers (2025-10-08T10:09:03Z)
Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language [21.87606488958834]
We present five German datasets for gender bias evaluation in large language models (LLMs)<n>The datasets are grounded in well-established concepts of gender bias and are accessible through multiple methodologies.<n>Our findings, reported for eight multilingual LLM models, reveal unique challenges associated with gender bias in German.
arXiv Detail & Related papers (2025-07-22T13:09:41Z)
Overview of the NLPCC 2025 Shared Task: Gender Bias Mitigation Challenge [16.204471028423917]
We propose a Chinese cOrpus foR Gender bIas Probing and Mitigation (CORGI-PM)<n>It contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context.<n>It is worth noting that CORGI-PM contains 5.2k gender-biased sentences along with the corresponding bias-eliminated versions rewritten by human annotators.
arXiv Detail & Related papers (2025-06-14T17:06:04Z)
EuroGEST: Investigating gender stereotypes in multilingual language models [53.88459905621724]
Large language models increasingly support multiple languages, yet most benchmarks for gender bias remain English-centric.<n>We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages.
arXiv Detail & Related papers (2025-06-04T11:58:18Z)
FairTranslate: An English-French Dataset for Gender Bias Evaluation in Machine Translation by Overcoming Gender Binarity [0.6827423171182154]
Large Language Models (LLMs) are increasingly leveraged for translation tasks but often fall short when translating inclusive language. This paper presents a novel, fully human-annotated dataset designed to evaluate non-binary gender biases in machine translation systems from English to French.
arXiv Detail & Related papers (2025-04-22T14:35:16Z)
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases. GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z)
Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora [9.959039325564744]
Gender bias in text corpora can lead to perpetuation and amplification of societal inequalities. Existing methods to measure gender representation bias in text corpora have mainly been proposed for English. This paper introduces a novel methodology to quantitatively measure gender representation bias in Spanish corpora.
arXiv Detail & Related papers (2024-06-19T16:30:58Z)
Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses. We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z)
Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study. We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z)
Easy Adaptation to Mitigate Gender Bias in Multilingual Text Classification [8.137681060429527]
We treat the gender as domains and present a standard domain adaptation model to reduce the gender bias. We evaluate our approach on two text classification tasks, hate speech detection and rating prediction, and demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-04-12T01:15:36Z)
GENder-IT: An Annotated English-Italian Parallel Challenge Set for Cross-Linguistic Natural Gender Phenomena [2.4366811507669124]
gENder-IT is an English--Italian challenge set focusing on the resolution of natural gender phenomena. It provides word-level gender tags on the English source side and multiple gender alternative, where translations needed, on the Italian target side.
arXiv Detail & Related papers (2021-08-05T21:08:45Z)
They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English. We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.