Related papers: Overview of the NLPCC 2025 Shared Task: Gender Bias Mitigation Challenge

Overview of the NLPCC 2025 Shared Task: Gender Bias Mitigation Challenge

URL: http://arxiv.org/abs/2506.12574v1
Date: Sat, 14 Jun 2025 17:06:04 GMT
Title: Overview of the NLPCC 2025 Shared Task: Gender Bias Mitigation Challenge
Authors: Yizhi Li, Ge Zhang, Hanhua Hong, Yiwen Wang, Chenghua Lin,
Abstract summary: We propose a Chinese cOrpus foR Gender bIas Probing and Mitigation (CORGI-PM)<n>It contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context.<n>It is worth noting that CORGI-PM contains 5.2k gender-biased sentences along with the corresponding bias-eliminated versions rewritten by human annotators.
Score: 16.204471028423917
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: As natural language processing for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques, such as pre-trained language models, suffer from biased corpus. This case becomes more obvious regarding those languages with less fairness-related computational linguistic resources, such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation (CORGI-PM), which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. It is worth noting that CORGI-PM contains 5.2k gender-biased sentences along with the corresponding bias-eliminated versions rewritten by human annotators. We pose three challenges as a shared task to automate the mitigation of textual gender bias, which requires the models to detect, classify, and mitigate textual gender bias. In the literature, we present the results and analysis for the teams participating this shared task in NLPCC 2025.

Related papers

GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.<n>GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z)
Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models. Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs. Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z)
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora [9.959039325564744]
Large language models (LLMs) often inherit and amplify social biases embedded in their training data.<n>Gender bias is the association of specific roles or traits with a particular gender.<n>Gender representation bias is the unequal frequency of references to individuals of different genders.
arXiv Detail & Related papers (2024-06-19T16:30:58Z)
Evaluating Gender Bias in the Translation of Gender-Neutral Languages into English [0.0]
We introduce GATE X-E, an extension to the GATE corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. The dataset features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena. We present an English gender rewriting solution built on GPT-3.5 Turbo and use GATE X-E to evaluate it.
arXiv Detail & Related papers (2023-11-15T10:25:14Z)
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models. We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas. We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z)
Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous Pronouns [53.62845317039185]
Bias-measuring datasets play a critical role in detecting biased behavior of language models. We propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation. We show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group.
arXiv Detail & Related papers (2023-02-11T12:11:03Z)
CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation [28.38578407487603]
We propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels. We address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.
arXiv Detail & Related papers (2023-01-01T12:48:12Z)
Evaluating Gender Bias in Natural Language Inference [5.034017602990175]
We propose an evaluation methodology to measure gender bias in natural language understanding through inference. We use our challenge task to investigate state-of-the-art NLI models on the presence of gender stereotypes using occupations. Our findings suggest that three models trained on MNLI and SNLI datasets are significantly prone to gender-induced prediction errors.
arXiv Detail & Related papers (2021-05-12T09:41:51Z)
Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias [5.239305978984572]
We show that for languages with type B reflexivization, we can construct multi-task challenge datasets for detecting gender bias. In these languages, the direct translation of 'the doctor removed his mask' is not ambiguous between a coreferential reading and a disjoint reading. We present a multilingual, multi-task challenge dataset, which spans four languages and four NLP tasks.
arXiv Detail & Related papers (2020-09-24T23:47:18Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.