On Generalization and Regularization via Wasserstein Distributionally
Robust Optimization
- URL: http://arxiv.org/abs/2212.05716v1
- Date: Mon, 12 Dec 2022 05:48:07 GMT
- Title: On Generalization and Regularization via Wasserstein Distributionally
Robust Optimization
- Authors: Qinyu Wu, Jonathan Yu-Meng Li, Tiantian Mao
- Abstract summary: Wasserstein distributionally robust optimization (DRO) has found success in operations research and machine learning applications.
We show that by focusing on Wasserstein DRO problems with affine decision rules, it is possible to obtain generalization bounds and the equivalency to regularization.
- Score: 0.5801044612920815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Wasserstein distributionally robust optimization (DRO) has found success in
operations research and machine learning applications as a powerful means to
obtain solutions with favourable out-of-sample performances. Two compelling
explanations for the success are the generalization bounds derived from
Wasserstein DRO and the equivalency between Wasserstein DRO and the
regularization scheme commonly applied in machine learning. Existing results on
generalization bounds and the equivalency to regularization are largely limited
to the setting where the Wasserstein ball is of a certain type and the decision
criterion takes certain forms of an expected function. In this paper, we show
that by focusing on Wasserstein DRO problems with affine decision rules, it is
possible to obtain generalization bounds and the equivalency to regularization
in a significantly broader setting where the Wasserstein ball can be of a
general type and the decision criterion can be a general measure of risk, i.e.,
nonlinear in distributions. This allows for accommodating many important
classification, regression, and risk minimization applications that have not
been addressed to date using Wasserstein DRO. Our results are strong in that
the generalization bounds do not suffer from the curse of dimensionality and
the equivalency to regularization is exact. As a byproduct, our regularization
results broaden considerably the class of Wasserstein DRO models that can be
solved efficiently via regularization formulations.
Related papers
- Universal generalization guarantees for Wasserstein distributionally robust models [10.036727981085223]
We present a novel proof technique that combines nonsmooth analysis rationale with classical concentration results.
Our approach is general enough to extend to the recent versions Wasserstein/Sinkhorn distributionally robust problems that involve (double) regularizations.
arXiv Detail & Related papers (2024-02-19T09:27:47Z) - Outlier-Robust Wasserstein DRO [19.355450629316486]
Distributionally robust optimization (DRO) is an effective approach for data-driven decision-making in the presence of uncertainty.
We propose a novel outlier-robust WDRO framework for decision-making under both geometric (Wasserstein) perturbations and non-geometric (TV) contamination.
We prove a strong duality result that enables tractable convex reformulations and efficient computation of our outlier-robust WDRO problem.
arXiv Detail & Related papers (2023-11-09T18:32:00Z) - Approximating a RUM from Distributions on k-Slates [88.32814292632675]
We find a generalization-time algorithm that finds the RUM that best approximates the given distribution on average.
Our theoretical result can also be made practical: we obtain a that is effective and scales to real-world datasets.
arXiv Detail & Related papers (2023-05-22T17:43:34Z) - Understanding Entropic Regularization in GANs [5.448283690603358]
We study the influence of regularization on the learned solution of Wasserstein distance.
We show that entropy regularization promotes the solution sparsification, while replacing the Wasserstein distance with the Sinkhorn divergence recovers the unregularized solution.
We conclude that these regularization techniques can improve the quality of the generator learned from empirical data for a large class of distributions.
arXiv Detail & Related papers (2021-11-02T06:08:16Z) - The Benefits of Implicit Regularization from SGD in Least Squares
Problems [116.85246178212616]
gradient descent (SGD) exhibits strong algorithmic regularization effects in practice.
We make comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression.
arXiv Detail & Related papers (2021-08-10T09:56:47Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Complexity-Free Generalization via Distributionally Robust Optimization [4.313143197674466]
We present an alternate route to obtain generalization bounds on the solution from distributionally robust optimization (DRO)
Our DRO bounds depend on the ambiguity set geometry and its compatibility with the true loss function.
Notably, when using maximum mean discrepancy as a DRO distance metric, our analysis implies, to the best of our knowledge, the first generalization bound in the literature that depends solely on the true loss function.
arXiv Detail & Related papers (2021-06-21T15:19:52Z) - Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Finite-Sample Guarantees for Wasserstein Distributionally Robust
Optimization: Breaking the Curse of Dimensionality [5.888646114353371]
Wasserstein distributionally robust optimization aims to find robust and generalizable solutions by hedging against data perturbations in Wasserstein distance.
Existing performance guarantees for generic loss functions are either overly conservative due to the curse of dimensionality, or plausible only in large sample complexitys.
We develop a non-asymptotic framework for analyzing the out-of-sample performance for Wasserstein robust learning and the generalization bound for its related Lipschitz and gradient regularization problems.
arXiv Detail & Related papers (2020-09-09T16:02:57Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.