DeAR: Debiasing Vision-Language Models with Additive Residuals
- URL: http://arxiv.org/abs/2303.10431v1
- Date: Sat, 18 Mar 2023 14:57:43 GMT
- Title: DeAR: Debiasing Vision-Language Models with Additive Residuals
- Authors: Ashish Seth, Mayur Hemani, Chirag Agarwal
- Abstract summary: Large pre-trained vision-language models (VLMs) provide rich, adaptable image and text representations.
These models suffer from societal biases owing to the skewed distribution of various identity groups in the training data.
We present DeAR, a novel debiasing method that learns additive residual image representations to offset the original representations.
- Score: 5.672132510411465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large pre-trained vision-language models (VLMs) reduce the time for
developing predictive models for various vision-grounded language downstream
tasks by providing rich, adaptable image and text representations. However,
these models suffer from societal biases owing to the skewed distribution of
various identity groups in the training data. These biases manifest as the
skewed similarity between the representations for specific text concepts and
images of people of different identity groups and, therefore, limit the
usefulness of such models in real-world high-stakes applications. In this work,
we present DeAR (Debiasing with Additive Residuals), a novel debiasing method
that learns additive residual image representations to offset the original
representations, ensuring fair output representations. In doing so, it reduces
the ability of the representations to distinguish between the different
identity groups. Further, we observe that the current fairness tests are
performed on limited face image datasets that fail to indicate why a specific
text concept should/should not apply to them. To bridge this gap and better
evaluate DeAR, we introduce the Protected Attribute Tag Association (PATA)
dataset - a new context-based bias benchmarking dataset for evaluating the
fairness of large pre-trained VLMs. Additionally, PATA provides visual context
for a diverse human population in different scenarios with both positive and
negative connotations. Experimental results for fairness and zero-shot
performance preservation using multiple datasets demonstrate the efficacy of
our framework.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.