Examining and Combating Spurious Features under Distribution Shift
- URL: http://arxiv.org/abs/2106.07171v1
- Date: Mon, 14 Jun 2021 05:39:09 GMT
- Title: Examining and Combating Spurious Features under Distribution Shift
- Authors: Chunting Zhou, Xuezhe Ma, Paul Michel, Graham Neubig
- Abstract summary: We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
- Score: 94.31956965507085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A central goal of machine learning is to learn robust representations that
capture the causal relationship between inputs features and output labels.
However, minimizing empirical risk over finite or biased datasets often results
in models latching on to spurious correlations between the training
input/output pairs that are not fundamental to the problem at hand. In this
paper, we define and analyze robust and spurious representations using the
information-theoretic concept of minimal sufficient statistics. We prove that
even when there is only bias of the input distribution (i.e. covariate shift),
models can still pick up spurious features from their training data. Group
distributionally robust optimization (DRO) provides an effective tool to
alleviate covariate shift by minimizing the worst-case training loss over a set
of pre-defined groups. Inspired by our analysis, we demonstrate that group DRO
can fail when groups do not directly account for various spurious correlations
that occur in the data. To address this, we further propose to minimize the
worst-case losses over a more flexible set of distributions that are defined on
the joint distribution of groups and instances, instead of treating each group
as a whole at optimization time. Through extensive experiments on one image and
two language tasks, we show that our model is significantly more robust than
comparable baselines under various partitions. Our code is available at
https://github.com/violet-zct/group-conditional-DRO.
Related papers
- Trained Models Tell Us How to Make Them Robust to Spurious Correlation without Group Annotation [3.894771553698554]
Empirical Risk Minimization (ERM) models tend to rely on attributes that have high spurious correlation with the target.
This can degrade the performance on underrepresented (or'minority') groups that lack these attributes.
We propose Environment-based Validation and Loss-based Sampling (EVaLS) to enhance robustness to spurious correlation.
arXiv Detail & Related papers (2024-10-07T08:17:44Z) - Efficient Bias Mitigation Without Privileged Information [14.21628601482357]
Deep neural networks trained via empirical risk minimisation often exhibit significant performance disparities across groups.
Existing bias mitigation methods that aim to address this issue often rely on group labels for training or validation.
We propose Targeted Augmentations for Bias Mitigation (TAB), a framework that leverages the entire training history of a helper model to identify spurious samples.
We show that TAB improves worst-group performance without any group information or model selection, outperforming existing methods while maintaining overall accuracy.
arXiv Detail & Related papers (2024-09-26T09:56:13Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization [61.39201891894024]
Group distributionally robust optimization (group DRO) can minimize the worst-case loss over pre-defined groups.
We reformulate the group DRO framework by proposing Q-Diversity.
Characterized by an interactive training mode, Q-Diversity relaxes the group identification from annotation into direct parameterization.
arXiv Detail & Related papers (2023-05-20T07:02:27Z) - Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group
Shifts [122.08782633878788]
Some robust training algorithms (e.g., Group DRO) specialize to group shifts and require group information on all training points.
Other methods (e.g., CVaR DRO) that do not need group annotations can be overly conservative.
We learn a model that maintains high accuracy on simple group functions realized by low features.
arXiv Detail & Related papers (2023-02-06T17:07:16Z) - Take One Gram of Neural Features, Get Enhanced Group Robustness [23.541213868620837]
Predictive performance of machine learning models trained with empirical risk minimization can degrade considerably under distribution shifts.
We propose to partition the training dataset into groups based on Gram matrices of features extracted by an identification'' model.
Our approach not only improves group robustness over ERM but also outperforms all recent baselines.
arXiv Detail & Related papers (2022-08-26T12:34:55Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.