Mitigating Test-Time Bias for Fair Image Retrieval
- URL: http://arxiv.org/abs/2305.19329v1
- Date: Tue, 23 May 2023 21:31:16 GMT
- Title: Mitigating Test-Time Bias for Fair Image Retrieval
- Authors: Fanjie Kong, Shuai Yuan, Weituo Hao, Ricardo Henao
- Abstract summary: We address the challenge of generating fair and unbiased image retrieval results given neutral textual queries.
We introduce a straightforward technique, Post-hoc Bias Mitigation, that post-processes the outputs from the pre-trained vision-language model.
Our approach achieves the lowest bias, compared with various existing bias-mitigation methods, in text-based image retrieval result.
- Score: 18.349154934096784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the challenge of generating fair and unbiased image retrieval
results given neutral textual queries (with no explicit gender or race
connotations), while maintaining the utility (performance) of the underlying
vision-language (VL) model. Previous methods aim to disentangle learned
representations of images and text queries from gender and racial
characteristics. However, we show these are inadequate at alleviating bias for
the desired equal representation result, as there usually exists test-time bias
in the target retrieval set. So motivated, we introduce a straightforward
technique, Post-hoc Bias Mitigation (PBM), that post-processes the outputs from
the pre-trained vision-language model. We evaluate our algorithm on real-world
image search datasets, Occupation 1 and 2, as well as two large-scale
image-text datasets, MS-COCO and Flickr30k. Our approach achieves the lowest
bias, compared with various existing bias-mitigation methods, in text-based
image retrieval result while maintaining satisfactory retrieval performance.
The source code is publicly available at
\url{https://anonymous.4open.science/r/Fair_Text_based_Image_Retrieval-D8B2}.
Related papers
- Debiasing Vison-Language Models with Text-Only Training [15.069736314663352]
We propose a Text-Only Debiasing framework called TOD, leveraging a text-as-image training paradigm to mitigate visual biases.
To address the limitations, we propose a Text-Only Debiasing framework called TOD, leveraging a text-as-image training paradigm to mitigate visual biases.
arXiv Detail & Related papers (2024-10-12T04:34:46Z) - GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models [75.04426753720553]
We propose a framework to identify, quantify, and explain biases in an open set setting.
This pipeline leverages a Large Language Model (LLM) to propose biases starting from a set of captions.
We show two variations of this framework: OpenBias and GradBias.
arXiv Detail & Related papers (2024-08-29T16:51:07Z) - When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection [35.09035417676343]
We show that the embeddings of text inputs unexpectedly tightly cluster together, far away from image embeddings, contrary to the model's contrastive training objective.
We propose a novel methodology called BLISS which directly accounts for this similarity bias through the use of an auxiliary, external set of text inputs.
arXiv Detail & Related papers (2024-07-24T08:20:02Z) - Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets.
Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize.
Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z) - Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic
Contrast Sets [52.77024349608834]
Vision-language models can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet.
COCO Captions is the most commonly used dataset for evaluating bias between background context and the gender of people in-situ.
We propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets.
arXiv Detail & Related papers (2023-05-24T17:59:18Z) - If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based
Text-to-Image Generation by Selection [53.320946030761796]
diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt.
We show that large T2I diffusion models are more faithful than usually assumed, and can generate images faithful to even complex prompts.
We introduce a pipeline that generates candidate images for a text prompt and picks the best one according to an automatic scoring system.
arXiv Detail & Related papers (2023-05-22T17:59:41Z) - DeAR: Debiasing Vision-Language Models with Additive Residuals [5.672132510411465]
Large pre-trained vision-language models (VLMs) provide rich, adaptable image and text representations.
These models suffer from societal biases owing to the skewed distribution of various identity groups in the training data.
We present DeAR, a novel debiasing method that learns additive residual image representations to offset the original representations.
arXiv Detail & Related papers (2023-03-18T14:57:43Z) - Discovering and Mitigating Visual Biases through Keyword Explanation [66.71792624377069]
We propose the Bias-to-Text (B2T) framework, which interprets visual biases as keywords.
B2T can identify known biases, such as gender bias in CelebA, background bias in Waterbirds, and distribution shifts in ImageNet-R/C.
B2T uncovers novel biases in larger datasets, such as Dollar Street and ImageNet.
arXiv Detail & Related papers (2023-01-26T13:58:46Z) - To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo [53.370023611101175]
We present a debiased dataset for the Person-centric Visual Grounding task first proposed by Cui et al.
Given an image and a caption, PCVG requires pairing up a person's name mentioned in a caption with a bounding box that points to the person in the image.
We find that the original Who's Waldo dataset contains a large number of biased samples that are solvable simply by methods.
arXiv Detail & Related papers (2022-03-30T21:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.