DeAR: Debiasing Vision-Language Models with Additive Residuals
- URL: http://arxiv.org/abs/2303.10431v1
- Date: Sat, 18 Mar 2023 14:57:43 GMT
- Title: DeAR: Debiasing Vision-Language Models with Additive Residuals
- Authors: Ashish Seth, Mayur Hemani, Chirag Agarwal
- Abstract summary: Large pre-trained vision-language models (VLMs) provide rich, adaptable image and text representations.
These models suffer from societal biases owing to the skewed distribution of various identity groups in the training data.
We present DeAR, a novel debiasing method that learns additive residual image representations to offset the original representations.
- Score: 5.672132510411465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large pre-trained vision-language models (VLMs) reduce the time for
developing predictive models for various vision-grounded language downstream
tasks by providing rich, adaptable image and text representations. However,
these models suffer from societal biases owing to the skewed distribution of
various identity groups in the training data. These biases manifest as the
skewed similarity between the representations for specific text concepts and
images of people of different identity groups and, therefore, limit the
usefulness of such models in real-world high-stakes applications. In this work,
we present DeAR (Debiasing with Additive Residuals), a novel debiasing method
that learns additive residual image representations to offset the original
representations, ensuring fair output representations. In doing so, it reduces
the ability of the representations to distinguish between the different
identity groups. Further, we observe that the current fairness tests are
performed on limited face image datasets that fail to indicate why a specific
text concept should/should not apply to them. To bridge this gap and better
evaluate DeAR, we introduce the Protected Attribute Tag Association (PATA)
dataset - a new context-based bias benchmarking dataset for evaluating the
fairness of large pre-trained VLMs. Additionally, PATA provides visual context
for a diverse human population in different scenarios with both positive and
negative connotations. Experimental results for fairness and zero-shot
performance preservation using multiple datasets demonstrate the efficacy of
our framework.
Related papers
- Debiasing Vison-Language Models with Text-Only Training [15.069736314663352]
We propose a Text-Only Debiasing framework called TOD, leveraging a text-as-image training paradigm to mitigate visual biases.
To address the limitations, we propose a Text-Only Debiasing framework called TOD, leveraging a text-as-image training paradigm to mitigate visual biases.
arXiv Detail & Related papers (2024-10-12T04:34:46Z) - Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification [49.41632476658246]
We discuss the extension of DFKD to Vision-Language Foundation Models without access to the billion-level image-text datasets.
The objective is to customize a student model for distribution-agnostic downstream tasks with given category concepts.
We propose three novel Prompt Diversification methods to encourage image synthesis with diverse styles.
arXiv Detail & Related papers (2024-07-21T13:26:30Z) - Leveraging vision-language models for fair facial attribute classification [19.93324644519412]
General-purpose vision-language model (VLM) is a rich knowledge source for common sensitive attributes.
We analyze the correspondence between VLM predicted and human defined sensitive attribute distribution.
Experiments on multiple benchmark facial attribute classification datasets show fairness gains of the model over existing unsupervised baselines.
arXiv Detail & Related papers (2024-03-15T18:37:15Z) - Leveraging Diffusion Perturbations for Measuring Fairness in Computer
Vision [25.414154497482162]
We demonstrate that diffusion models can be leveraged to create such a dataset.
We benchmark several vision-language models on a multi-class occupation classification task.
We find that images generated with non-Caucasian labels have a significantly higher occupation misclassification rate than images generated with Caucasian labels.
arXiv Detail & Related papers (2023-11-25T19:40:13Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners [88.07317175639226]
We propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners.
Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information.
arXiv Detail & Related papers (2023-05-18T05:41:36Z) - DualFair: Fair Representation Learning at Both Group and Individual
Levels via Contrastive Self-supervision [73.80009454050858]
This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations.
Our model jointly optimize for two fairness criteria - group fairness and counterfactual fairness.
arXiv Detail & Related papers (2023-03-15T07:13:54Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Through a fair looking-glass: mitigating bias in image datasets [1.0323063834827415]
We present a fast and effective model to de-bias an image dataset through reconstruction and minimizing the statistical dependence between intended variables.
We evaluate our proposed model on CelebA dataset, compare the results with a state-of-the-art de-biasing method, and show that the model achieves a promising fairness-accuracy combination.
arXiv Detail & Related papers (2022-09-18T20:28:36Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Visual Recognition with Deep Learning from Biased Image Datasets [6.10183951877597]
We show how biasing models can be applied to remedy problems in the context of visual recognition.
Based on the (approximate) knowledge of the biasing mechanisms at work, our approach consists in reweighting the observations.
We propose to use a low dimensional image representation, shared across the image databases.
arXiv Detail & Related papers (2021-09-06T10:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.