Distributional Robustness Bounds Generalization Errors
- URL: http://arxiv.org/abs/2212.09962v3
- Date: Mon, 25 Mar 2024 16:07:31 GMT
- Title: Distributional Robustness Bounds Generalization Errors
- Authors: Shixiong Wang, Haowei Wang,
- Abstract summary: We suggest a quantitative definition for "distributional robustness"
We show that Bayesian methods are distributionally robust in the probably approximately correct (PAC) sense.
We show that generalization errors of machine learning models can be characterized using the distributional uncertainty of the nominal distribution.
- Score: 2.3940819037450987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian methods, distributionally robust optimization methods, and regularization methods are three pillars of trustworthy machine learning combating distributional uncertainty, e.g., the uncertainty of an empirical distribution compared to the true underlying distribution. This paper investigates the connections among the three frameworks and, in particular, explores why these frameworks tend to have smaller generalization errors. Specifically, first, we suggest a quantitative definition for "distributional robustness", propose the concept of "robustness measure", and formalize several philosophical concepts in distributionally robust optimization. Second, we show that Bayesian methods are distributionally robust in the probably approximately correct (PAC) sense; in addition, by constructing a Dirichlet-process-like prior in Bayesian nonparametrics, it can be proven that any regularized empirical risk minimization method is equivalent to a Bayesian method. Third, we show that generalization errors of machine learning models can be characterized using the distributional uncertainty of the nominal distribution and the robustness measures of these machine learning models, which is a new perspective to bound generalization errors, and therefore, explain the reason why distributionally robust machine learning models, Bayesian models, and regularization models tend to have smaller generalization errors in a unified manner.
Related papers
- Inflationary Flows: Calibrated Bayesian Inference with Diffusion-Based Models [0.0]
We show how diffusion-based models can be repurposed for performing principled, identifiable Bayesian inference.
We show how such maps can be learned via standard DBM training using a novel noise schedule.
The result is a class of highly expressive generative models, uniquely defined on a low-dimensional latent space.
arXiv Detail & Related papers (2024-07-11T19:58:19Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Symmetric Q-learning: Reducing Skewness of Bellman Error in Online
Reinforcement Learning [55.75959755058356]
In deep reinforcement learning, estimating the value function is essential to evaluate the quality of states and actions.
A recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator.
We proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution.
arXiv Detail & Related papers (2024-03-12T14:49:19Z) - Learning Against Distributional Uncertainty: On the Trade-off Between
Robustness and Specificity [24.874664446700272]
This paper studies a new framework that unifies the three approaches and that addresses the two challenges mentioned above.
The properties (e.g., consistency and normalities), non-asymptotic properties (e.g., unbiasedness and error bound), and a Monte-Carlo-based solution method of the proposed model are studied.
arXiv Detail & Related papers (2023-01-31T11:33:18Z) - Scalable Dynamic Mixture Model with Full Covariance for Probabilistic
Traffic Forecasting [16.04029885574568]
We propose a dynamic mixture of zero-mean Gaussian distributions for the time-varying error process.
The proposed method can be seamlessly integrated into existing deep-learning frameworks with only a few additional parameters to be learned.
We evaluate the proposed method on a traffic speed forecasting task and find that our method not only improves model horizons but also provides interpretabletemporal correlation structures.
arXiv Detail & Related papers (2022-12-10T22:50:00Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - A likelihood approach to nonparametric estimation of a singular
distribution using deep generative models [4.329951775163721]
We investigate a likelihood approach to nonparametric estimation of a singular distribution using deep generative models.
We prove that a novel and effective solution exists by perturbing the data with an instance noise.
We also characterize the class of distributions that can be efficiently estimated via deep generative models.
arXiv Detail & Related papers (2021-05-09T23:13:58Z) - Achieving Efficiency in Black Box Simulation of Distribution Tails with
Self-structuring Importance Samplers [1.6114012813668934]
The paper presents a novel Importance Sampling (IS) scheme for estimating distribution of performance measures modeled with a rich set of tools such as linear programs, integer linear programs, piecewise linear/quadratic objectives, feature maps specified with deep neural networks, etc.
arXiv Detail & Related papers (2021-02-14T03:37:22Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - A One-step Approach to Covariate Shift Adaptation [82.01909503235385]
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution.
We propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization.
arXiv Detail & Related papers (2020-07-08T11:35:47Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.