FairCLIP: Social Bias Elimination based on Attribute Prototype Learning and Representation Neutralization
- URL: http://arxiv.org/abs/2210.14562v2
- Date: Thu, 30 May 2024 08:38:50 GMT
- Title: FairCLIP: Social Bias Elimination based on Attribute Prototype Learning and Representation Neutralization
- Authors: Junyang Wang, Yi Zhang, Jitao Sang,
- Abstract summary: We propose FairCLIP to eliminate the social bias in CLIP-based image retrieval.
FairCLIP achieves the neutralization of the representation common to all CLIP downstream tasks.
- Score: 19.105267891045532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Vision-Language Pre-training (VLP) models like CLIP have gained popularity in recent years. However, many works found that the social biases hidden in CLIP easily manifest in downstream tasks, especially in image retrieval, which can have harmful effects on human society. In this work, we propose FairCLIP to eliminate the social bias in CLIP-based image retrieval without damaging the retrieval performance achieving the compatibility between the debiasing effect and the retrieval performance. FairCLIP is divided into two steps: Attribute Prototype Learning (APL) and Representation Neutralization (RN). In the first step, we extract the concepts needed for debiasing in CLIP. We use the query with learnable word vector prefixes as the extraction structure. In the second step, we first divide the attributes into target and bias attributes. By analysis, we find that both attributes have an impact on the bias. Therefore, we try to eliminate the bias by using Re-Representation Matrix (RRM) to achieve the neutralization of the representation. We compare the debiasing effect and retrieval performance with other methods, and experiments demonstrate that FairCLIP can achieve the best compatibility. Although FairCLIP is used to eliminate bias in image retrieval, it achieves the neutralization of the representation which is common to all CLIP downstream tasks. This means that FairCLIP can be applied as a general debiasing method for other fairness issues related to CLIP.
Related papers
- ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation [6.012828781329036]
We propose to explicitly model and rectify the bias existing in CLIP to facilitate the unsupervised semantic segmentation task.
We use a learnable ''Reference'' prompt to encode class-preference bias and a projection of the positional embedding in vision transformer to encode space-preference bias.
To make the bias modeling and rectification process meaningful and effective, a contrastive loss based on masked visual features and the text features of different classes is imposed.
arXiv Detail & Related papers (2024-08-13T09:10:48Z) - Refining Skewed Perceptions in Vision-Language Models through Visual Representations [0.033483662989441935]
Large vision-language models (VLMs) have become foundational, demonstrating remarkable success across a variety of downstream tasks.
Despite their advantages, these models inherit biases from the disproportionate distribution of real-world data, leading to misconceptions about the actual environment.
This study presents an investigation into how a simple linear probe can effectively distill task-specific core features from CLIP's embedding for downstream applications.
arXiv Detail & Related papers (2024-05-22T22:03:11Z) - AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning [50.78033979438031]
We first introduce a unified formulation to analyze CLIP-based few-shot learning methods from a perspective of logit bias.
Based on analysis of key components, this paper proposes a novel AMU-Tuning method to learn effective logit bias for CLIP-based few-shot classification.
arXiv Detail & Related papers (2024-04-13T10:46:11Z) - FairerCLIP: Debiasing CLIP's Zero-Shot Predictions using Functions in RKHSs [24.991684983495542]
This paper proposes FairerCLIP, a general approach for making zero-shot predictions of CLIP more fair and robust to spurious correlations.
We formulate the problem of jointly debiasing CLIP's image and text representations in reproducing Hilbert kernel spaces (RKHSs)
arXiv Detail & Related papers (2024-03-22T19:41:26Z) - Decoupled Contrastive Learning for Long-Tailed Recognition [58.255966442426484]
Supervised Contrastive Loss (SCL) is popular in visual representation learning.
In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance.
We propose a patch-based self distillation to transfer knowledge from head to tail classes to relieve the under-representation of tail classes.
arXiv Detail & Related papers (2024-03-10T09:46:28Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - Non-Contrastive Learning Meets Language-Image Pre-Training [145.6671909437841]
We study the validity of non-contrastive language-image pre-training (nCLIP)
We introduce xCLIP, a multi-tasking framework combining CLIP and nCLIP, and show that nCLIP aids CLIP in enhancing feature semantics.
arXiv Detail & Related papers (2022-10-17T17:57:46Z) - Fair Contrastive Learning for Facial Attribute Classification [25.436462696033846]
We propose a new Fair Supervised Contrastive Loss (FSCL) for fair visual representation learning.
In this paper, we for the first time analyze unfairness caused by supervised contrastive learning.
Our method is robust to the intensity of data bias and effectively works in incomplete supervised settings.
arXiv Detail & Related papers (2022-03-30T11:16:18Z) - DenseCLIP: Extract Free Dense Labels from CLIP [130.3830819077699]
Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition.
DenseCLIP+ surpasses SOTA transductive zero-shot semantic segmentation methods by large margins.
Our finding suggests that DenseCLIP can serve as a new reliable source of supervision for dense prediction tasks.
arXiv Detail & Related papers (2021-12-02T09:23:01Z) - Pose-guided Visible Part Matching for Occluded Person ReID [80.81748252960843]
We propose a Pose-guided Visible Part Matching (PVPM) method that jointly learns the discriminative features with pose-guided attention and self-mines the part visibility.
Experimental results on three reported occluded benchmarks show that the proposed method achieves competitive performance to state-of-the-art methods.
arXiv Detail & Related papers (2020-04-01T04:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.