Related papers: Debiasing Vision-Language Models via Biased Prompts

Debiasing Vision-Language Models via Biased Prompts

URL: http://arxiv.org/abs/2302.00070v2
Date: Mon, 15 May 2023 07:51:14 GMT
Title: Debiasing Vision-Language Models via Biased Prompts
Authors: Ching-Yao Chuang, Varun Jampani, Yuanzhen Li, Antonio Torralba, Stefanie Jegelka
Abstract summary: We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
Score: 79.04467131711775
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning models have been shown to inherit biases from their training datasets. This can be particularly problematic for vision-language foundation models trained on uncurated datasets scraped from the internet. The biases can be amplified and propagated to downstream applications like zero-shot classifiers and text-to-image generative models. In this study, we propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. In particular, we show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models. The proposed closed-form solution enables easy integration into large-scale pipelines, and empirical results demonstrate that our approach effectively reduces social bias and spurious correlation in both discriminative and generative vision-language models without the need for additional data or training.

Related papers

Detecting Prefix Bias in LLM-based Reward Models [4.596249232904721]
We introduce novel methods to detect and evaluate prefix bias in reward models trained on preference datasets.<n>We leverage these metrics to reveal significant biases in preference models across racial and gender dimensions.<n>Our findings highlight the critical need for bias-aware dataset design and evaluation in developing fair and reliable reward models.
arXiv Detail & Related papers (2025-05-13T21:50:03Z)
Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing [18.864168501187393]
We introduce Diffusing DeBias (DDB) as a plug-in for common methods of unsupervised model debiasing.<n>Specifically, our approach adopts conditional diffusion models to generate synthetic bias-aligned images.<n>By tackling the fundamental issue of bias-conflicting training samples in learning auxiliary models, our proposed method beats current state-of-the-art in multiple benchmark datasets.
arXiv Detail & Related papers (2025-02-13T18:17:03Z)
InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models [28.51460282167433]
diffusion models are highly data-driven and prone to inheriting imbalances and biases present in real-world data. We propose a framework, InvDiff, which aims to learn invariant semantic information for diffusion guidance. InvDiff effectively reduces biases while maintaining the quality of image generation.
arXiv Detail & Related papers (2024-12-11T15:47:11Z)
debiaSAE: Benchmarking and Mitigating Vision-Language Model Bias [1.3995965887921709]
We analyze demographic biases across five models and six datasets. Portrait datasets like UTKFace and CelebA are the best tools for bias detection. Our debiasing method improves fairness, gaining 5-15 points in performance over the baseline.
arXiv Detail & Related papers (2024-10-17T02:03:27Z)
Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings. We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z)
Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI [17.101569078791492]
We study 43 CLIP vision-language models to determine whether they learn human-like facial impression biases. We show for the first time that the the degree to which a bias is shared across a society predicts the degree to which it is reflected in a CLIP model.
arXiv Detail & Related papers (2024-08-04T08:26:58Z)
Addressing Bias Through Ensemble Learning and Regularized Fine-Tuning [0.2812395851874055]
This paper proposes a comprehensive approach using multiple methods to remove bias in AI models. We train multiple models with the counter-bias of the pre-trained model through data splitting, local training, and regularized fine-tuning. We conclude our solution with knowledge distillation that results in a single unbiased neural network.
arXiv Detail & Related papers (2024-02-01T09:24:36Z)
Current Topological and Machine Learning Applications for Bias Detection in Text [4.799066966918178]
This study utilizes the RedditBias database to analyze textual biases. Four transformer models, including BERT and RoBERTa variants, were explored. Findings suggest BERT, particularly mini BERT, excels in bias classification, while multilingual models lag.
arXiv Detail & Related papers (2023-11-22T16:12:42Z)
Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios. Existing debiasing methods suffer from high costs in bias labeling or model re-training. We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z)
General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z)
A Generative Approach for Mitigating Structural Biases in Natural Language Inference [24.44419010439227]
In this work, we reformulate the NLI task as a generative task, where a model is conditioned on the biased subset of the input and the label. We show that this approach is highly robust to large amounts of bias. We find that generative models are difficult to train and they generally perform worse than discriminative baselines.
arXiv Detail & Related papers (2021-08-31T17:59:45Z)
Learning from others' mistakes: Avoiding dataset biases without modeling them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task. Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available. We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)
Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases. First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method. The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.