Finite-Sample Guarantees for Wasserstein Distributionally Robust
Optimization: Breaking the Curse of Dimensionality
- URL: http://arxiv.org/abs/2009.04382v3
- Date: Sat, 30 Apr 2022 22:44:11 GMT
- Title: Finite-Sample Guarantees for Wasserstein Distributionally Robust
Optimization: Breaking the Curse of Dimensionality
- Authors: Rui Gao
- Abstract summary: Wasserstein distributionally robust optimization aims to find robust and generalizable solutions by hedging against data perturbations in Wasserstein distance.
Existing performance guarantees for generic loss functions are either overly conservative due to the curse of dimensionality, or plausible only in large sample complexitys.
We develop a non-asymptotic framework for analyzing the out-of-sample performance for Wasserstein robust learning and the generalization bound for its related Lipschitz and gradient regularization problems.
- Score: 5.888646114353371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Wasserstein distributionally robust optimization (DRO) aims to find robust
and generalizable solutions by hedging against data perturbations in
Wasserstein distance. Despite its recent empirical success in operations
research and machine learning, existing performance guarantees for generic loss
functions are either overly conservative due to the curse of dimensionality, or
plausible only in large sample asymptotics. In this paper, we develop a
non-asymptotic framework for analyzing the out-of-sample performance for
Wasserstein robust learning and the generalization bound for its related
Lipschitz and gradient regularization problems. To the best of our knowledge,
this gives the first finite-sample guarantee for generic Wasserstein DRO
problems without suffering from the curse of dimensionality. Our results
highlight that Wasserstein DRO, with a properly chosen radius, balances between
the empirical mean of the loss and the variation of the loss, measured by the
Lipschitz norm or the gradient norm of the loss. Our analysis is based on two
novel methodological developments that are of independent interest: 1) a new
concentration inequality controlling the decay rate of large deviation
probabilities by the variation of the loss and, 2) a localized Rademacher
complexity theory based on the variation of the loss.
Related papers
- Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity [54.145730036889496]
This paper deals with Gradient learning (FL) in the presence of malicious attacks Byzantine data.
A novel Average Algorithm (RAGA) is proposed, which leverages robustness aggregation and can select a dataset.
arXiv Detail & Related papers (2024-03-20T08:15:08Z) - Universal generalization guarantees for Wasserstein distributionally robust models [10.036727981085223]
We present a novel proof technique that combines nonsmooth analysis rationale with classical concentration results.
Our approach is general enough to extend to the recent versions Wasserstein/Sinkhorn distributionally robust problems that involve (double) regularizations.
arXiv Detail & Related papers (2024-02-19T09:27:47Z) - Outlier-Robust Wasserstein DRO [19.355450629316486]
Distributionally robust optimization (DRO) is an effective approach for data-driven decision-making in the presence of uncertainty.
We propose a novel outlier-robust WDRO framework for decision-making under both geometric (Wasserstein) perturbations and non-geometric (TV) contamination.
We prove a strong duality result that enables tractable convex reformulations and efficient computation of our outlier-robust WDRO problem.
arXiv Detail & Related papers (2023-11-09T18:32:00Z) - On Generalization and Regularization via Wasserstein Distributionally
Robust Optimization [0.5801044612920815]
Wasserstein distributionally robust optimization (DRO) has found success in operations research and machine learning applications.
We show that by focusing on Wasserstein DRO problems with affine decision rules, it is possible to obtain generalization bounds and the equivalency to regularization.
arXiv Detail & Related papers (2022-12-12T05:48:07Z) - Wasserstein Distributionally Robust Estimation in High Dimensions:
Performance Analysis and Optimal Hyperparameter Tuning [0.0]
We propose a Wasserstein distributionally robust estimation framework to estimate an unknown parameter from noisy linear measurements.
We focus on the task of analyzing the squared error performance of such estimators.
We show that the squared error can be recovered as the solution of a convex-concave optimization problem.
arXiv Detail & Related papers (2022-06-27T13:02:59Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Complexity-Free Generalization via Distributionally Robust Optimization [4.313143197674466]
We present an alternate route to obtain generalization bounds on the solution from distributionally robust optimization (DRO)
Our DRO bounds depend on the ambiguity set geometry and its compatibility with the true loss function.
Notably, when using maximum mean discrepancy as a DRO distance metric, our analysis implies, to the best of our knowledge, the first generalization bound in the literature that depends solely on the true loss function.
arXiv Detail & Related papers (2021-06-21T15:19:52Z) - Continuous Wasserstein-2 Barycenter Estimation without Minimax
Optimization [94.18714844247766]
Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport.
We present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures.
arXiv Detail & Related papers (2021-02-02T21:01:13Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.