Universal generalization guarantees for Wasserstein distributionally robust models
- URL: http://arxiv.org/abs/2402.11981v3
- Date: Mon, 27 Jan 2025 10:31:36 GMT
- Title: Universal generalization guarantees for Wasserstein distributionally robust models
- Authors: Tam Le, Jérôme Malick,
- Abstract summary: Distributionally robust optimization has emerged as an attractive way to train robust machine learning models.
Recent statistical analyses have proved that generalization guarantees of robust models based on the Wasserstein distance have generalization guarantees that do not suffer from the curse of dimensionality.
We establish exact generalization guarantees that cover a wide range of cases, with arbitrary transport costs and parametric loss functions.
- Score: 10.036727981085223
- License:
- Abstract: Distributionally robust optimization has emerged as an attractive way to train robust machine learning models, capturing data uncertainty and distribution shifts. Recent statistical analyses have proved that generalization guarantees of robust models based on the Wasserstein distance have generalization guarantees that do not suffer from the curse of dimensionality. However, these results are either approximate, obtained in specific cases, or based on assumptions difficult to verify in practice. In contrast, we establish exact generalization guarantees that cover a wide range of cases, with arbitrary transport costs and parametric loss functions, including deep learning objectives with nonsmooth activations. We complete our analysis with an excess bound on the robust objective and an extension to Wasserstein robust models with entropic regularizations.
Related papers
- Benign Overfitting in Out-of-Distribution Generalization of Linear Models [19.203753135860016]
We take an initial step towards understanding benign overfitting in the Out-of-Distribution (OOD) regime.
We provide non-asymptotic guarantees proving that benign overfitting occurs in standard ridge regression.
We also present theoretical results for a more general family of target covariance matrix.
arXiv Detail & Related papers (2024-12-19T02:47:39Z) - Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Fine-Grained Dynamic Framework for Bias-Variance Joint Optimization on Data Missing Not at Random [2.8165314121189247]
In most practical applications such as recommendation systems, display advertising, and so forth, the collected data often contains missing values.
We develop a systematic fine-grained dynamic learning framework to jointly optimize bias and variance.
arXiv Detail & Related papers (2024-05-24T10:07:09Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - Exact Generalization Guarantees for (Regularized) Wasserstein
Distributionally Robust Models [0.0]
Wasserstein distributionally robust estimators have emerged as powerful models for prediction and decision-making under uncertainty.
We show that these generalization guarantees actually hold on general classes of models, do not suffer from the curse of dimensionality, and can even cover distribution shifts at testing.
arXiv Detail & Related papers (2023-05-26T16:35:57Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - A Short and General Duality Proof for Wasserstein Distributionally Robust Optimization [11.034091190797671]
We present a general duality result for Wasserstein distributionally robust optimization that holds for any Kantorovich transport cost, measurable loss function, and nominal probability distribution.
We demonstrate that the interchangeability principle holds if and only if certain measurable projection and weak measurable selection conditions are satisfied.
arXiv Detail & Related papers (2022-04-30T22:49:01Z) - Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data
to Learn Robust and Invariant Representations [76.85274970052762]
Regularizing distance between embeddings/representations of original samples and augmented counterparts is a popular technique for improving robustness of neural networks.
In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings.
We show that the generic approach we identified (squared $ell$ regularized augmentation) outperforms several recent methods, which are each specially designed for one task.
arXiv Detail & Related papers (2020-11-25T22:40:09Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.