Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs
- URL: http://arxiv.org/abs/2505.17217v2
- Date: Fri, 01 Aug 2025 16:37:16 GMT
- Title: Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs
- Authors: Kangda Wei, Hasnat Md Abdullah, Ruihong Huang,
- Abstract summary: Large Language Models (LLMs) often exhibit gender bias, resulting in unequal treatment of male and female subjects.<n>Our approach prompts models to generate story pairs featuring male and female protagonists in structurally identical, morally ambiguous scenarios.<n>When inconsistencies arise, the model is guided to produce balanced, gender-neutral judgments.
- Score: 15.365993658296016
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) often exhibit gender bias, resulting in unequal treatment of male and female subjects across different contexts. To address this issue, we propose a novel data generation framework that fosters exploratory thinking in LLMs. Our approach prompts models to generate story pairs featuring male and female protagonists in structurally identical, morally ambiguous scenarios, then elicits and compares their moral judgments. When inconsistencies arise, the model is guided to produce balanced, gender-neutral judgments. These story-judgment pairs are used to fine-tune or optimize the models via Direct Preference Optimization (DPO). Experimental results show that our method significantly reduces gender bias while preserving or even enhancing general model capabilities. We will release the code and generated data. We release the code and generated data at: https://github.com/WeiKangda/LLMs-Exploratory-Bias-Mitigation/tree/main.
Related papers
- Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) [82.57490175399693]
We study gender bias in 22 popular image-to-text vision-language assistants (VLAs)<n>Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances.<n>To eliminate the gender bias in these models, we find that fine-tuning-based debiasing methods achieve the best trade-off between debiasing and retaining performance.
arXiv Detail & Related papers (2024-10-25T05:59:44Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - PopAlign: Population-Level Alignment for Fair Text-to-Image Generation [26.457571615782985]
We introduce PopAlign, a novel approach for population-level preference optimization.
We show that PopAlign significantly mitigates the bias of pretrained T2I models while largely preserving the generation quality.
arXiv Detail & Related papers (2024-06-28T05:38:32Z) - MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias [23.10522891268232]
We introduce a Mixture-of-Experts approach to mitigate gender bias in text-to-image models.
We show that our approach successfully mitigates gender bias while maintaining image quality.
arXiv Detail & Related papers (2024-06-25T14:59:31Z) - GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models [20.98831667981121]
Large Language Models (LLMs) are prone to generating content that exhibits gender biases.<n>GenderAlign dataset comprises 8k single-turn dialogues, each paired with a "chosen" and a "rejected" response.<n>Compared to the "rejected" responses, the "chosen" responses demonstrate lower levels of gender bias and higher quality.
arXiv Detail & Related papers (2024-06-20T01:45:44Z) - Disclosure and Mitigation of Gender Bias in LLMs [64.79319733514266]
Large Language Models (LLMs) can generate biased responses.
We propose an indirect probing framework based on conditional generation.
We explore three distinct strategies to disclose explicit and implicit gender bias in LLMs.
arXiv Detail & Related papers (2024-02-17T04:48:55Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Exploring Gender Bias in Retrieval Models [2.594412743115663]
Mitigating gender bias in information retrieval is important to avoid propagating stereotypes.
We employ a dataset consisting of two components: (1) relevance of a document to a query and (2) "gender" of a document.
We show that pre-trained models for IR do not perform well in zero-shot retrieval tasks when full fine-tuning of a large pre-trained BERT encoder is performed.
We also illustrate that pre-trained models have gender biases that result in retrieved articles tending to be more often male than female.
arXiv Detail & Related papers (2022-08-02T21:12:05Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - Adversarial Examples Generation for Reducing Implicit Gender Bias in
Pre-trained Models [2.6329024988388925]
We propose a method to automatically generate implicit gender bias samples at sentence-level and a metric to measure gender bias.
The metric will be used to guide the generation of examples from Pre-trained models. Therefore, those examples could be used to impose attacks on Pre-trained Models.
arXiv Detail & Related papers (2021-10-03T20:22:54Z) - Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution
and Machine Translation [10.542861450223128]
We find grammatical patterns indicating stereotypical and non-stereotypical gender-role assignments in corpora from three domains.
We manually verify the quality of our corpus and use it to evaluate gender bias in various coreference resolution and machine translation models.
arXiv Detail & Related papers (2021-09-08T18:14:11Z) - First the worst: Finding better gender translations during beam search [19.921216907778447]
We focus on gender bias resulting from systematic errors in grammatical gender translation.
We experiment with reranking nbest lists using gender features obtained automatically from the source sentence.
We find that a combination of these techniques allows large gains in WinoMT accuracy without requiring additional bilingual data or an additional NMT model.
arXiv Detail & Related papers (2021-04-15T12:53:30Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.