Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition
- URL: http://arxiv.org/abs/2502.15809v1
- Date: Wed, 19 Feb 2025 12:05:33 GMT
- Title: Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition
- Authors: Xinyu Tian, Shu Zou, Zhaoyuan Yang, Mengqi He, Jing Zhang,
- Abstract summary: Few-shot adaptation for Vision-Language Models (VLMs) presents a dilemma: balancing in-distribution accuracy with out-of-distribution generalization.<n>Recent research has utilized low-level concepts such as visual attributes to enhance generalization.<n>This study reveals that VLMs overly rely on a small subset of attributes on decision-making, which co-occur with the category but are not inherently part of it, spuriously correlated attributes.
- Score: 8.950906917573986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot adaptation for Vision-Language Models (VLMs) presents a dilemma: balancing in-distribution accuracy with out-of-distribution generalization. Recent research has utilized low-level concepts such as visual attributes to enhance generalization. However, this study reveals that VLMs overly rely on a small subset of attributes on decision-making, which co-occur with the category but are not inherently part of it, termed spuriously correlated attributes. This biased nature of VLMs results in poor generalization. To address this, 1) we first propose Spurious Attribute Probing (SAP), identifying and filtering out these problematic attributes to significantly enhance the generalization of existing attribute-based methods; 2) We introduce Spurious Attribute Shielding (SAS), a plug-and-play module that mitigates the influence of these attributes on prediction, seamlessly integrating into various Parameter-Efficient Fine-Tuning (PEFT) methods. In experiments, SAP and SAS significantly enhance accuracy on distribution shifts across 11 datasets and 3 generalization tasks without compromising downstream performance, establishing a new state-of-the-art benchmark.
Related papers
- Debiased Prompt Tuning in Vision-Language Model without Annotations [14.811475313694041]
Vision-Language Models (VLMs) may suffer from the problem of spurious correlations.
By leveraging pseudo-spurious attribute annotations, we propose a method to automatically adjust the training weights of different groups.
Our approach efficiently improves the worst-group accuracy on CelebA, Waterbirds, and MetaShift datasets.
arXiv Detail & Related papers (2025-03-11T12:24:54Z) - Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - Leveraging vision-language models for fair facial attribute classification [19.93324644519412]
General-purpose vision-language model (VLM) is a rich knowledge source for common sensitive attributes.
We analyze the correspondence between VLM predicted and human defined sensitive attribute distribution.
Experiments on multiple benchmark facial attribute classification datasets show fairness gains of the model over existing unsupervised baselines.
arXiv Detail & Related papers (2024-03-15T18:37:15Z) - Distributionally Generative Augmentation for Fair Facial Attribute Classification [69.97710556164698]
Facial Attribute Classification (FAC) holds substantial promise in widespread applications.
FAC models trained by traditional methodologies can be unfair by exhibiting accuracy inconsistencies across varied data subpopulations.
This work proposes a novel, generation-based two-stage framework to train a fair FAC model on biased data without additional annotation.
arXiv Detail & Related papers (2024-03-11T10:50:53Z) - SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable
Pedestrian Attribute Recognition [23.55622798950833]
A novel Scale and Spatial Priors Guided Network (SSPNet) is proposed for Pedestrian Attribute Recognition (PAR) models.
SSPNet learns to provide reasonable scale prior information for different attribute groups, allowing the model to focus on different levels of feature maps.
A novel IoU based attribute localization metric is proposed for Weakly-supervised Pedestrian Attribute localization (WPAL) based on the improved Grad-CAM for attribute response mask.
arXiv Detail & Related papers (2023-12-11T00:41:40Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - A Solution to Co-occurrence Bias: Attributes Disentanglement via Mutual
Information Minimization for Pedestrian Attribute Recognition [10.821982414387525]
We show that current methods can actually suffer in generalizing such fitted attributes interdependencies onto scenes or identities off the dataset distribution.
To render models robust in realistic scenes, we propose the attributes-disentangled feature learning to ensure the recognition of an attribute not inferring on the existence of others.
arXiv Detail & Related papers (2023-07-28T01:34:55Z) - SFP: Spurious Feature-targeted Pruning for Out-of-Distribution
Generalization [38.37530720506389]
We propose a novel Spurious Feature-targeted model Pruning framework, dubbed SFP, to automatically explore invariant substructures.
SFP can significantly outperform both structure-based and non-structure-based OOD generalization SOTAs, with accuracy improvement up to 4.72% and 23.35%, respectively.
arXiv Detail & Related papers (2023-05-19T11:46:36Z) - Calibrated Feature Decomposition for Generalizable Person
Re-Identification [82.64133819313186]
Calibrated Feature Decomposition (CFD) module focuses on improving the generalization capacity for person re-identification.
A calibrated-and-standardized Batch normalization (CSBN) is designed to learn calibrated person representation.
arXiv Detail & Related papers (2021-11-27T17:12:43Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.