Elastic Representation: Mitigating Spurious Correlations for Group Robustness
- URL: http://arxiv.org/abs/2502.09850v1
- Date: Fri, 14 Feb 2025 01:25:27 GMT
- Title: Elastic Representation: Mitigating Spurious Correlations for Group Robustness
- Authors: Tao Wen, Zihan Wang, Quan Zhang, Qi Lei,
- Abstract summary: Deep learning models can suffer from severe performance degradation when relying on spurious correlations between input features and labels.
We propose Elastic Representation (ElRep) to learn features by imposing Nuclear- and Frobenius-norm penalties on the representation from the last layer of a neural network.
- Score: 24.087096334524077
- License:
- Abstract: Deep learning models can suffer from severe performance degradation when relying on spurious correlations between input features and labels, making the models perform well on training data but have poor prediction accuracy for minority groups. This problem arises especially when training data are limited or imbalanced. While most prior work focuses on learning invariant features (with consistent correlations to y), it overlooks the potential harm of spurious correlations between features. We hereby propose Elastic Representation (ElRep) to learn features by imposing Nuclear- and Frobenius-norm penalties on the representation from the last layer of a neural network. Similar to the elastic net, ElRep enjoys the benefits of learning important features without losing feature diversity. The proposed method is simple yet effective. It can be integrated into many deep learning approaches to mitigate spurious correlations and improve group robustness. Moreover, we theoretically show that ElRep has minimum negative impacts on in-distribution predictions. This is a remarkable advantage over approaches that prioritize minority groups at the cost of overall performance.
Related papers
- Out of spuriousity: Improving robustness to spurious correlations without group annotations [2.592470112714595]
We propose an approach to extract a subnetwork from a fully trained network that does not rely on spurious correlations.
The increase in the worst-group performance of our approach contributes to strengthening the hypothesis that there exists a subnetwork in a fully trained dense network.
arXiv Detail & Related papers (2024-07-20T20:24:14Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Identifying Spurious Biases Early in Training through the Lens of
Simplicity Bias [25.559684790787866]
We show that examples with spurious features are provably separable based on the model's output early in training.
We propose SPARE, which identifies spurious correlations early in training and utilizes importance sampling to alleviate their effect.
arXiv Detail & Related papers (2023-05-30T05:51:36Z) - Inducing Neural Collapse in Deep Long-tailed Learning [13.242721780822848]
We propose two explicit feature regularization terms to learn high-quality representation for class-imbalanced data.
With the proposed regularization, Neural Collapse phenomena will appear under the class-imbalanced distribution.
Our method is easily implemented, highly effective, and can be plugged into most existing methods.
arXiv Detail & Related papers (2023-02-24T05:07:05Z) - On Feature Learning in the Presence of Spurious Correlations [45.86963293019703]
We show that the quality learned feature representations is greatly affected by the design decisions beyond the method.
We significantly improve upon the best results reported in the literature on the popular Waterbirds, Celeb hair color prediction and WILDS-FMOW problems.
arXiv Detail & Related papers (2022-10-20T16:10:28Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial
Estimation [109.11580756757611]
Deep ensembles perform better than a single network thanks to the diversity among their members.
Recent approaches regularize predictions to increase diversity; however, they also drastically decrease individual members' performances.
We introduce a novel training criterion called DICE: it increases diversity by reducing spurious correlations among features.
arXiv Detail & Related papers (2021-01-14T10:53:26Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.