Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
- URL: http://arxiv.org/abs/2405.16860v1
- Date: Mon, 27 May 2024 06:20:58 GMT
- Title: Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
- Authors: Yunqi Zhang, Songda Li, Chunyuan Deng, Luyi Wang, Hui Zhao,
- Abstract summary: Gender bias in vision-language models (VLMs) can reinforce harmful stereotypes and discrimination.
We propose GAMA, a task-agnostic generation framework to mitigate gender bias.
During narrative generation, GAMA yields all-sided but gender-obfuscated narratives.
During answer inference, GAMA integrates the image, generated narrative, and a task-specific question prompt to infer answers for different vision-language tasks.
- Score: 5.123567809055078
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Gender bias in vision-language models (VLMs) can reinforce harmful stereotypes and discrimination. In this paper, we focus on mitigating gender bias towards vision-language tasks. We identify object hallucination as the essence of gender bias in VLMs. Existing VLMs tend to focus on salient or familiar attributes in images but ignore contextualized nuances. Moreover, most VLMs rely on the co-occurrence between specific objects and gender attributes to infer the ignored features, ultimately resulting in gender bias. We propose GAMA, a task-agnostic generation framework to mitigate gender bias. GAMA consists of two stages: narrative generation and answer inference. During narrative generation, GAMA yields all-sided but gender-obfuscated narratives, which prevents premature concentration on localized image features, especially gender attributes. During answer inference, GAMA integrates the image, generated narrative, and a task-specific question prompt to infer answers for different vision-language tasks. This approach allows the model to rethink gender attributes and answers. We conduct extensive experiments on GAMA, demonstrating its debiasing and generalization ability.
Related papers
- Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) [82.57490175399693]
We study gender bias in 22 popular image-to-text vision-language assistants (VLAs)
Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances.
To eliminate the gender bias in these models, we find that finetuning-based debiasing methods achieve the best tradeoff between debiasing and retaining performance on downstream tasks.
arXiv Detail & Related papers (2024-10-25T05:59:44Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - Disclosure and Mitigation of Gender Bias in LLMs [64.79319733514266]
Large Language Models (LLMs) can generate biased responses.
We propose an indirect probing framework based on conditional generation.
We explore three distinct strategies to disclose explicit and implicit gender bias in LLMs.
arXiv Detail & Related papers (2024-02-17T04:48:55Z) - Probing Explicit and Implicit Gender Bias through LLM Conditional Text
Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses.
We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Model-Agnostic Gender Debiased Image Captioning [29.640940966944697]
Image captioning models are known to perpetuate and amplify harmful societal bias in the training set.
We propose a framework, called LIBRA, that learns from synthetically biased samples to decrease both types of biases.
arXiv Detail & Related papers (2023-04-07T15:30:49Z) - Type B Reflexivization as an Unambiguous Testbed for Multilingual
Multi-Task Gender Bias [5.239305978984572]
We show that for languages with type B reflexivization, we can construct multi-task challenge datasets for detecting gender bias.
In these languages, the direct translation of 'the doctor removed his mask' is not ambiguous between a coreferential reading and a disjoint reading.
We present a multilingual, multi-task challenge dataset, which spans four languages and four NLP tasks.
arXiv Detail & Related papers (2020-09-24T23:47:18Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.