Focus on the Common Good: Group Distributional Robustness Follows
- URL: http://arxiv.org/abs/2110.02619v1
- Date: Wed, 6 Oct 2021 09:47:41 GMT
- Title: Focus on the Common Good: Group Distributional Robustness Follows
- Authors: Vihari Piratla, Praneeth Netrapalli, Sunita Sarawagi
- Abstract summary: This paper proposes a new and simple algorithm that explicitly encourages learning of features that are shared across various groups.
While Group-DRO focuses on groups with worst regularized loss, focusing instead, on groups that enable better performance even on other groups, could lead to learning of shared/common features.
- Score: 47.62596240492509
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of training a classification model with group
annotated training data. Recent work has established that, if there is
distribution shift across different groups, models trained using the standard
empirical risk minimization (ERM) objective suffer from poor performance on
minority groups and that group distributionally robust optimization (Group-DRO)
objective is a better alternative. The starting point of this paper is the
observation that though Group-DRO performs better than ERM on minority groups
for some benchmark datasets, there are several other datasets where it performs
much worse than ERM. Inspired by ideas from the closely related problem of
domain generalization, this paper proposes a new and simple algorithm that
explicitly encourages learning of features that are shared across various
groups. The key insight behind our proposed algorithm is that while Group-DRO
focuses on groups with worst regularized loss, focusing instead, on groups that
enable better performance even on other groups, could lead to learning of
shared/common features, thereby enhancing minority performance beyond what is
achieved by Group-DRO. Empirically, we show that our proposed algorithm matches
or achieves better performance compared to strong contemporary baselines
including ERM and Group-DRO on standard benchmarks on both minority groups and
across all groups. Theoretically, we show that the proposed algorithm is a
descent method and finds first order stationary points of smooth nonconvex
functions.
Related papers
- Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization [61.39201891894024]
Group distributionally robust optimization (group DRO) can minimize the worst-case loss over pre-defined groups.
We reformulate the group DRO framework by proposing Q-Diversity.
Characterized by an interactive training mode, Q-Diversity relaxes the group identification from annotation into direct parameterization.
arXiv Detail & Related papers (2023-05-20T07:02:27Z) - Ranking & Reweighting Improves Group Distributional Robustness [14.021069321266516]
We propose a ranking-based training method called Discounted Rank Upweighting (DRU) to learn models that exhibit strong OOD performance on the test data.
Results on several synthetic and real-world datasets highlight the superior ability of our group-ranking-based (akin to soft-minimax) approach in selecting and learning models that are robust to group distributional shifts.
arXiv Detail & Related papers (2023-05-09T20:37:16Z) - Distributionally Robust Optimization with Probabilistic Group [24.22720998340643]
We propose a novel framework PG-DRO for distributionally robust optimization.
Key to our framework is soft group membership instead of hard group annotations.
Our framework accommodates samples with group membership ambiguity, offering stronger flexibility and generality than the prior art.
arXiv Detail & Related papers (2023-03-10T09:31:44Z) - AGRO: Adversarial Discovery of Error-prone groups for Robust
Optimization [109.91265884632239]
Group distributionally robust optimization (G-DRO) can minimize the worst-case loss over a set of pre-defined groups over training data.
We propose AGRO -- Adversarial Group discovery for Distributionally Robust Optimization.
AGRO results in 8% higher model performance on average on known worst-groups, compared to prior group discovery approaches.
arXiv Detail & Related papers (2022-12-02T00:57:03Z) - Take One Gram of Neural Features, Get Enhanced Group Robustness [23.541213868620837]
Predictive performance of machine learning models trained with empirical risk minimization can degrade considerably under distribution shifts.
We propose to partition the training dataset into groups based on Gram matrices of features extracted by an identification'' model.
Our approach not only improves group robustness over ERM but also outperforms all recent baselines.
arXiv Detail & Related papers (2022-08-26T12:34:55Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Just Train Twice: Improving Group Robustness without Training Group
Information [101.84574184298006]
Standard training via empirical risk minimization can produce models that achieve high accuracy on average but low accuracy on certain groups.
Prior approaches that achieve high worst-group accuracy, like group distributionally robust optimization (group DRO) require expensive group annotations for each training point.
We propose a simple two-stage approach, JTT, that first trains a standard ERM model for several epochs, and then trains a second model that upweights the training examples that the first model misclassified.
arXiv Detail & Related papers (2021-07-19T17:52:32Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.