DORO: Distributional and Outlier Robust Optimization
- URL: http://arxiv.org/abs/2106.06142v1
- Date: Fri, 11 Jun 2021 02:59:54 GMT
- Title: DORO: Distributional and Outlier Robust Optimization
- Authors: Runtian Zhai, Chen Dan, J. Zico Kolter, Pradeep Ravikumar
- Abstract summary: We propose the framework of DORO, for Distributional and Outlier Robust Optimization.
At the core of this approach is a refined risk function which prevents DRO from overfitting to potential outliers.
We theoretically prove the effectiveness of the proposed method, and empirically show that DORO improves the performance and stability of DRO with experiments on large modern datasets.
- Score: 98.44757325531631
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many machine learning tasks involve subpopulation shift where the testing
data distribution is a subpopulation of the training distribution. For such
settings, a line of recent work has proposed the use of a variant of empirical
risk minimization(ERM) known as distributionally robust optimization (DRO). In
this work, we apply DRO to real, large-scale tasks with subpopulation shift,
and observe that DRO performs relatively poorly, and moreover has severe
instability. We identify one direct cause of this phenomenon: sensitivity of
DRO to outliers in the datasets. To resolve this issue, we propose the
framework of DORO, for Distributional and Outlier Robust Optimization. At the
core of this approach is a refined risk function which prevents DRO from
overfitting to potential outliers. We instantiate DORO for the Cressie-Read
family of R\'enyi divergence, and delve into two specific instances of this
family: CVaR and $\chi^2$-DRO. We theoretically prove the effectiveness of the
proposed method, and empirically show that DORO improves the performance and
stability of DRO with experiments on large modern datasets, thereby positively
addressing the open question raised by Hashimoto et al., 2018.
Related papers
- Distributionally Robust Optimization [8.750805813120898]
DRO studies decision problems under uncertainty where the probability distribution governing the uncertain problem parameters is itself uncertain.
DRO seeks decisions that perform best under the worst distribution in the ambiguity set.
Recent research has uncovered its deep connections to regularization techniques and adversarial training in machine learning.
arXiv Detail & Related papers (2024-11-04T19:32:24Z) - Geometry-Calibrated DRO: Combating Over-Pessimism with Free Energy
Implications [31.3535638804615]
Machine learning algorithms minimize average risk are susceptible to distributional shifts.
distributionally robust optimization (DRO) addresses this issue by optimizing the worst-case risk within an uncertainty set.
DRO suffers from over-pessimism, leading to low-confidence predictions, poor parameter estimations as well as poor generalization.
In this work, we conduct a theoretical analysis of a probable root cause of over-pessimism: excessive focus on noisy samples.
arXiv Detail & Related papers (2023-11-08T23:33:39Z) - Smoothed $f$-Divergence Distributionally Robust Optimization [5.50764401597583]
We argue that a special type of distributionallly robust optimization (DRO) formulation offers theoretical advantages.
DRO uses an ambiguity set based on a Kullback Leibler (KL) divergence smoothed by the Wasserstein or L'evy-Prokhorov (LP) distance.
arXiv Detail & Related papers (2023-06-24T19:22:01Z) - Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group
Shifts [122.08782633878788]
Some robust training algorithms (e.g., Group DRO) specialize to group shifts and require group information on all training points.
Other methods (e.g., CVaR DRO) that do not need group annotations can be overly conservative.
We learn a model that maintains high accuracy on simple group functions realized by low features.
arXiv Detail & Related papers (2023-02-06T17:07:16Z) - Optimal algorithms for group distributionally robust optimization and
beyond [48.693477387133484]
We devise algorithms for a class of DRO problems including group DRO, subpopulation fairness, and empirical conditional value at risk.
Our new algorithms achieve faster convergence rates than existing algorithms for multiple DRO settings.
Empirically, too, our algorithms outperform known methods.
arXiv Detail & Related papers (2022-12-28T02:45:46Z) - AGRO: Adversarial Discovery of Error-prone groups for Robust
Optimization [109.91265884632239]
Group distributionally robust optimization (G-DRO) can minimize the worst-case loss over a set of pre-defined groups over training data.
We propose AGRO -- Adversarial Group discovery for Distributionally Robust Optimization.
AGRO results in 8% higher model performance on average on known worst-groups, compared to prior group discovery approaches.
arXiv Detail & Related papers (2022-12-02T00:57:03Z) - Distributionally Robust Bayesian Optimization with $\varphi$-divergences [45.48814080654241]
We consider robustness against data-shift in $varphi$-divergences, which subsumes many popular choices, such as the Total Variation, and the extant Kullback-Leibler divergence.
We show that the DRO-BO problem in this setting is equivalent to a finite-dimensional optimization problem which, even in the continuous context setting, can be easily implemented with provable sublinear regret bounds.
arXiv Detail & Related papers (2022-03-04T04:34:52Z) - Algorithmic Bias and Data Bias: Understanding the Relation between
Distributionally Robust Optimization and Data Curation [1.370633147306388]
Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data.
In social and economic applications, where data represent people, this can lead to discrimination underrepresented gender and ethnic groups.
arXiv Detail & Related papers (2021-06-17T13:18:03Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.