Mitigating Spurious Correlations in Multi-modal Models during
Fine-tuning
- URL: http://arxiv.org/abs/2304.03916v2
- Date: Tue, 30 May 2023 21:19:20 GMT
- Title: Mitigating Spurious Correlations in Multi-modal Models during
Fine-tuning
- Authors: Yu Yang, Besmira Nushi, Hamid Palangi, Baharan Mirzasoleiman
- Abstract summary: Spurious correlations that degrade model generalization or lead the model to be right for the wrong reasons are one of the main robustness concerns for real-world deployments.
This paper proposes a novel approach to address spurious correlations during fine-tuning for a given domain of interest.
- Score: 18.45898471459533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spurious correlations that degrade model generalization or lead the model to
be right for the wrong reasons are one of the main robustness concerns for
real-world deployments. However, mitigating these correlations during
pre-training for large-scale models can be costly and impractical, particularly
for those without access to high-performance computing resources. This paper
proposes a novel approach to address spurious correlations during fine-tuning
for a given domain of interest. With a focus on multi-modal models (e.g.,
CLIP), the proposed method leverages different modalities in these models to
detect and explicitly set apart spurious attributes from the affected class,
achieved through a multi-modal contrastive loss function that expresses
spurious relationships through language. Our experimental results and in-depth
visualizations on CLIP show that such an intervention can effectively i)
improve the model's accuracy when spurious attributes are not present, and ii)
directs the model's activation maps towards the actual class rather than the
spurious attribute when present. In particular, on the Waterbirds dataset, our
algorithm achieved a worst-group accuracy 23% higher than ERM on CLIP with a
ResNet-50 backbone, and 32% higher on CLIP with a ViT backbone, while
maintaining the same average accuracy as ERM.
Related papers
- Towards Robust Text Classification: Mitigating Spurious Correlations with Causal Learning [2.7813683000222653]
We propose the Causally Calibrated Robust ( CCR) to reduce models' reliance on spurious correlations.
CCR integrates a causal feature selection method based on counterfactual reasoning, along with an inverse propensity weighting (IPW) loss function.
We show that CCR state-of-the-art performance among methods without group labels, and in some cases, it can compete with the models that utilize group labels.
arXiv Detail & Related papers (2024-11-01T21:29:07Z) - MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - Autoencoder based approach for the mitigation of spurious correlations [2.7624021966289605]
Spurious correlations refer to erroneous associations in data that do not reflect true underlying relationships.
These correlations can lead deep neural networks (DNNs) to learn patterns that are not robust across diverse datasets or real-world scenarios.
We propose an autoencoder-based approach to analyze the nature of spurious correlations that exist in the Global Wheat Head Detection (GWHD) 2021 dataset.
arXiv Detail & Related papers (2024-06-27T05:28:44Z) - Spuriousness-Aware Meta-Learning for Learning Robust Classifiers [26.544938760265136]
Spurious correlations are brittle associations between certain attributes of inputs and target variables.
Deep image classifiers often leverage them for predictions, leading to poor generalization on the data where the correlations do not hold.
Mitigating the impact of spurious correlations is crucial towards robust model generalization, but it often requires annotations of the spurious correlations in data.
arXiv Detail & Related papers (2024-06-15T21:41:25Z) - Dealing with negative samples with multi-task learning on span-based
joint entity-relation extraction [0.7252027234425334]
Recent span-based joint extraction models have demonstrated significant advantages in both entity recognition and relation extraction.
This paper introduces a span-based multitask entity-relation joint extraction model.
arXiv Detail & Related papers (2023-09-18T12:28:46Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue
Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work.
Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z) - Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs)
We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z) - Beyond Marginal Uncertainty: How Accurately can Bayesian Regression
Models Estimate Posterior Predictive Correlations? [13.127549105535623]
It is often more useful to estimate predictive correlations between the function values at different input locations.
We first consider a downstream task which depends on posterior predictive correlations: transductive active learning (TAL)
Since TAL is too expensive and indirect to guide development of algorithms, we introduce two metrics which more directly evaluate the predictive correlations.
arXiv Detail & Related papers (2020-11-06T03:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.