Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group
Shifts
- URL: http://arxiv.org/abs/2302.02931v2
- Date: Thu, 12 Oct 2023 03:47:00 GMT
- Title: Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group
Shifts
- Authors: Amrith Setlur, Don Dennis, Benjamin Eysenbach, Aditi Raghunathan,
Chelsea Finn, Virginia Smith, Sergey Levine
- Abstract summary: Some robust training algorithms (e.g., Group DRO) specialize to group shifts and require group information on all training points.
Other methods (e.g., CVaR DRO) that do not need group annotations can be overly conservative.
We learn a model that maintains high accuracy on simple group functions realized by low features.
- Score: 122.08782633878788
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training machine learning models robust to distribution shifts is critical
for real-world applications. Some robust training algorithms (e.g., Group DRO)
specialize to group shifts and require group information on all training
points. Other methods (e.g., CVaR DRO) that do not need group annotations can
be overly conservative, since they naively upweight high loss points which may
form a contrived set that does not correspond to any meaningful group in the
real world (e.g., when the high loss points are randomly mislabeled training
points). In this work, we address limitations in prior approaches by assuming a
more nuanced form of group shift: conditioned on the label, we assume that the
true group function (indicator over group) is simple. For example, we may
expect that group shifts occur along low bitrate features (e.g., image
background, lighting). Thus, we aim to learn a model that maintains high
accuracy on simple group functions realized by these low bitrate features, that
need not spend valuable model capacity achieving high accuracy on contrived
groups of examples. Based on this, we consider the two-player game formulation
of DRO where the adversary's capacity is bitrate-constrained. Our resulting
practical algorithm, Bitrate-Constrained DRO (BR-DRO), does not require group
information on training samples yet matches the performance of Group DRO on
datasets that have training group annotations and that of CVaR DRO on
long-tailed distributions. Our theoretical analysis reveals that in some
settings BR-DRO objective can provably yield statistically efficient and less
conservative solutions than unconstrained CVaR DRO.
Related papers
- Trained Models Tell Us How to Make Them Robust to Spurious Correlation without Group Annotation [3.894771553698554]
Empirical Risk Minimization (ERM) models tend to rely on attributes that have high spurious correlation with the target.
This can degrade the performance on underrepresented (or'minority') groups that lack these attributes.
We propose Environment-based Validation and Loss-based Sampling (EVaLS) to enhance robustness to spurious correlation.
arXiv Detail & Related papers (2024-10-07T08:17:44Z) - Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization [61.39201891894024]
Group distributionally robust optimization (group DRO) can minimize the worst-case loss over pre-defined groups.
We reformulate the group DRO framework by proposing Q-Diversity.
Characterized by an interactive training mode, Q-Diversity relaxes the group identification from annotation into direct parameterization.
arXiv Detail & Related papers (2023-05-20T07:02:27Z) - Outlier-Robust Group Inference via Gradient Space Clustering [50.87474101594732]
Existing methods can improve the worst-group performance, but they require group annotations, which are often expensive and sometimes infeasible to obtain.
We address the problem of learning group annotations in the presence of outliers by clustering the data in the space of gradients of the model parameters.
We show that data in the gradient space has a simpler structure while preserving information about minority groups and outliers, making it suitable for standard clustering methods like DBSCAN.
arXiv Detail & Related papers (2022-10-13T06:04:43Z) - Take One Gram of Neural Features, Get Enhanced Group Robustness [23.541213868620837]
Predictive performance of machine learning models trained with empirical risk minimization can degrade considerably under distribution shifts.
We propose to partition the training dataset into groups based on Gram matrices of features extracted by an identification'' model.
Our approach not only improves group robustness over ERM but also outperforms all recent baselines.
arXiv Detail & Related papers (2022-08-26T12:34:55Z) - Improved Group Robustness via Classifier Retraining on Independent
Splits [6.930560177764658]
Group distributionally robust optimization is a widely used baseline for learning models with strong worst-group performance.
This paper designs a simple method based on the idea of retraining on independent splits of the training data.
We find that using a novel sample-splitting procedure achieves robust worst-group performance in the fine-tuning step.
arXiv Detail & Related papers (2022-04-20T16:22:27Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Focus on the Common Good: Group Distributional Robustness Follows [47.62596240492509]
This paper proposes a new and simple algorithm that explicitly encourages learning of features that are shared across various groups.
While Group-DRO focuses on groups with worst regularized loss, focusing instead, on groups that enable better performance even on other groups, could lead to learning of shared/common features.
arXiv Detail & Related papers (2021-10-06T09:47:41Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.