Principled learning method for Wasserstein distributionally robust
optimization with local perturbations
- URL: http://arxiv.org/abs/2006.03333v2
- Date: Mon, 22 Jun 2020 16:57:28 GMT
- Title: Principled learning method for Wasserstein distributionally robust
optimization with local perturbations
- Authors: Yongchan Kwon, Wonyoung Kim, Joong-Ho Won, Myunghee Cho Paik
- Abstract summary: Wasserstein distributionally robust optimization (WDRO) attempts to learn a model that minimizes the local worst-case risk in the vicinity of the empirical data distribution.
We propose a minimizer based on a novel approximation theorem and provide the corresponding risk consistency results.
Our results show that the proposed method achieves significantly higher accuracy than baseline models on noisy datasets.
- Score: 21.611525306059985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Wasserstein distributionally robust optimization (WDRO) attempts to learn a
model that minimizes the local worst-case risk in the vicinity of the empirical
data distribution defined by Wasserstein ball. While WDRO has received
attention as a promising tool for inference since its introduction, its
theoretical understanding has not been fully matured. Gao et al. (2017)
proposed a minimizer based on a tractable approximation of the local worst-case
risk, but without showing risk consistency. In this paper, we propose a
minimizer based on a novel approximation theorem and provide the corresponding
risk consistency results. Furthermore, we develop WDRO inference for locally
perturbed data that include the Mixup (Zhang et al., 2017) as a special case.
We show that our approximation and risk consistency results naturally extend to
the cases when data are locally perturbed. Numerical experiments demonstrate
robustness of the proposed method using image classification datasets. Our
results show that the proposed method achieves significantly higher accuracy
than baseline models on noisy datasets.
Related papers
- Risk-Sensitive Diffusion: Robustly Optimizing Diffusion Models with Noisy Samples [58.68233326265417]
Non-image data are prevalent in real applications and tend to be noisy.
Risk-sensitive SDE is a type of differential equation (SDE) parameterized by the risk vector.
We conduct systematic studies for both Gaussian and non-Gaussian noise distributions.
arXiv Detail & Related papers (2024-02-03T08:41:51Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - Distributionally Robust Skeleton Learning of Discrete Bayesian Networks [9.46389554092506]
We consider the problem of learning the exact skeleton of general discrete Bayesian networks from potentially corrupted data.
We propose to optimize the most adverse risk over a family of distributions within bounded Wasserstein distance or KL divergence to the empirical distribution.
We present efficient algorithms and show the proposed methods are closely related to the standard regularized regression approach.
arXiv Detail & Related papers (2023-11-10T15:33:19Z) - Outlier-Robust Wasserstein DRO [19.355450629316486]
Distributionally robust optimization (DRO) is an effective approach for data-driven decision-making in the presence of uncertainty.
We propose a novel outlier-robust WDRO framework for decision-making under both geometric (Wasserstein) perturbations and non-geometric (TV) contamination.
We prove a strong duality result that enables tractable convex reformulations and efficient computation of our outlier-robust WDRO problem.
arXiv Detail & Related papers (2023-11-09T18:32:00Z) - Geometry-Calibrated DRO: Combating Over-Pessimism with Free Energy
Implications [31.3535638804615]
Machine learning algorithms minimize average risk are susceptible to distributional shifts.
distributionally robust optimization (DRO) addresses this issue by optimizing the worst-case risk within an uncertainty set.
DRO suffers from over-pessimism, leading to low-confidence predictions, poor parameter estimations as well as poor generalization.
In this work, we conduct a theoretical analysis of a probable root cause of over-pessimism: excessive focus on noisy samples.
arXiv Detail & Related papers (2023-11-08T23:33:39Z) - On the Variance, Admissibility, and Stability of Empirical Risk
Minimization [80.26309576810844]
Empirical Risk Minimization (ERM) with squared loss may attain minimax suboptimal error rates.
We show that under mild assumptions, the suboptimality of ERM must be due to large bias rather than variance.
We also show that our estimates imply stability of ERM, complementing the main result of Caponnetto and Rakhlin (2006) for non-Donsker classes.
arXiv Detail & Related papers (2023-05-29T15:25:48Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Approximate Regions of Attraction in Learning with Decision-Dependent
Distributions [11.304363655760513]
We analyze repeated risk minimization as the trajectories of the gradient flows of performative risk minimization.
We provide conditions to characterize the region of attraction for the various equilibria in this setting.
We introduce the notion of performative alignment, which provides a geometric condition on the convergence of repeated risk minimization to performative risk minimizers.
arXiv Detail & Related papers (2021-06-30T18:38:08Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Distributionally Robust Local Non-parametric Conditional Estimation [22.423052432220235]
We propose a new distributionally robust estimator that generates non-parametric local estimates.
We show that despite being generally intractable, the local estimator can be efficiently found via convex optimization.
Experiments with synthetic and MNIST datasets show the competitive performance of this new class of estimators.
arXiv Detail & Related papers (2020-10-12T00:11:17Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.