Discovering and Mitigating Visual Biases through Keyword Explanation
- URL: http://arxiv.org/abs/2301.11104v4
- Date: Wed, 27 Mar 2024 03:47:20 GMT
- Title: Discovering and Mitigating Visual Biases through Keyword Explanation
- Authors: Younghyun Kim, Sangwoo Mo, Minkyu Kim, Kyungmin Lee, Jaeho Lee, Jinwoo Shin,
- Abstract summary: We propose the Bias-to-Text (B2T) framework, which interprets visual biases as keywords.
B2T can identify known biases, such as gender bias in CelebA, background bias in Waterbirds, and distribution shifts in ImageNet-R/C.
B2T uncovers novel biases in larger datasets, such as Dollar Street and ImageNet.
- Score: 66.71792624377069
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Addressing biases in computer vision models is crucial for real-world AI deployments. However, mitigating visual biases is challenging due to their unexplainable nature, often identified indirectly through visualization or sample statistics, which necessitates additional human supervision for interpretation. To tackle this issue, we propose the Bias-to-Text (B2T) framework, which interprets visual biases as keywords. Specifically, we extract common keywords from the captions of mispredicted images to identify potential biases in the model. We then validate these keywords by measuring their similarity to the mispredicted images using a vision-language scoring model. The keyword explanation form of visual bias offers several advantages, such as a clear group naming for bias discovery and a natural extension for debiasing using these group names. Our experiments demonstrate that B2T can identify known biases, such as gender bias in CelebA, background bias in Waterbirds, and distribution shifts in ImageNet-R/C. Additionally, B2T uncovers novel biases in larger datasets, such as Dollar Street and ImageNet. For example, we discovered a contextual bias between "bee" and "flower" in ImageNet. We also highlight various applications of B2T keywords, including debiased training, CLIP prompting, and model comparison.
Related papers
- Identifying Implicit Social Biases in Vision-Language Models [34.53206726136747]
We conduct a systematic analysis of the social biases that are present in vision-language models.
We find that CLIP frequently displays undesirable associations between harmful words and specific demographic groups.
Our findings highlight the importance of evaluating and addressing bias in vision-language models.
arXiv Detail & Related papers (2024-11-01T19:41:28Z) - Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention [9.859335795616028]
We propose a novel causal intervention training scheme named CIBi to eliminate language bias from a finer-grained perspective.
We employ causal intervention and contrastive learning to eliminate context bias and improve the multi-modal representation.
We design a new question-only branch based on counterfactual generation to distill and eliminate keyword bias.
arXiv Detail & Related papers (2024-10-14T06:09:16Z) - GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models [75.04426753720553]
We propose a framework to identify, quantify, and explain biases in an open set setting.
This pipeline leverages a Large Language Model (LLM) to propose biases starting from a set of captions.
We show two variations of this framework: OpenBias and GradBias.
arXiv Detail & Related papers (2024-08-29T16:51:07Z) - OpenBias: Open-set Bias Detection in Text-to-Image Generative Models [108.2219657433884]
We tackle the challenge of open-set bias detection in text-to-image generative models presenting OpenBias.
OpenBias identifies and quantifies the severity of biases agnostically, without access to any precompiled set.
We study the behavior of Stable Diffusion 1.5, 2, and XL emphasizing new biases, never investigated before.
arXiv Detail & Related papers (2024-04-11T17:59:56Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic
Contrast Sets [52.77024349608834]
Vision-language models can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet.
COCO Captions is the most commonly used dataset for evaluating bias between background context and the gender of people in-situ.
We propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets.
arXiv Detail & Related papers (2023-05-24T17:59:18Z) - Mitigating Test-Time Bias for Fair Image Retrieval [18.349154934096784]
We address the challenge of generating fair and unbiased image retrieval results given neutral textual queries.
We introduce a straightforward technique, Post-hoc Bias Mitigation, that post-processes the outputs from the pre-trained vision-language model.
Our approach achieves the lowest bias, compared with various existing bias-mitigation methods, in text-based image retrieval result.
arXiv Detail & Related papers (2023-05-23T21:31:16Z) - To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo [53.370023611101175]
We present a debiased dataset for the Person-centric Visual Grounding task first proposed by Cui et al.
Given an image and a caption, PCVG requires pairing up a person's name mentioned in a caption with a bounding box that points to the person in the image.
We find that the original Who's Waldo dataset contains a large number of biased samples that are solvable simply by methods.
arXiv Detail & Related papers (2022-03-30T21:35:53Z) - Identification of Biased Terms in News Articles by Comparison of
Outlet-specific Word Embeddings [9.379650501033465]
We train two word embedding models, one on texts of left-wing, the other on right-wing news outlets.
Our hypothesis is that a word's representations in both word embedding spaces are more similar for non-biased words than biased words.
This paper presents the first in-depth look at the context of bias words measured by word embeddings.
arXiv Detail & Related papers (2021-12-14T13:23:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.