Robust PAC$^m$: Training Ensemble Models Under Misspecification and
Outliers
- URL: http://arxiv.org/abs/2203.01859v3
- Date: Sun, 23 Apr 2023 15:12:44 GMT
- Title: Robust PAC$^m$: Training Ensemble Models Under Misspecification and
Outliers
- Authors: Matteo Zecchin, Sangwoo Park, Osvaldo Simeone, Marios Kountouris,
David Gesbert
- Abstract summary: PAC-Bayes theory demonstrates that the free energy criterion minimized by Bayesian learning is a bound on the generalization error for Gibbs predictors.
This work presents a novel robust free energy criterion that combines the generalized score function with PAC$m$ ensemble bounds.
- Score: 46.38465729190199
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Standard Bayesian learning is known to have suboptimal generalization
capabilities under misspecification and in the presence of outliers. PAC-Bayes
theory demonstrates that the free energy criterion minimized by Bayesian
learning is a bound on the generalization error for Gibbs predictors (i.e., for
single models drawn at random from the posterior) under the assumption of
sampling distributions uncontaminated by outliers. This viewpoint provides a
justification for the limitations of Bayesian learning when the model is
misspecified, requiring ensembling, and when data is affected by outliers. In
recent work, PAC-Bayes bounds -- referred to as PAC$^m$ -- were derived to
introduce free energy metrics that account for the performance of ensemble
predictors, obtaining enhanced performance under misspecification. This work
presents a novel robust free energy criterion that combines the generalized
logarithm score function with PAC$^m$ ensemble bounds. The proposed free energy
training criterion produces predictive distributions that are able to
concurrently counteract the detrimental effects of misspecification -- with
respect to both likelihood and prior distribution -- and outliers.
Related papers
- Bayesian vs. PAC-Bayesian Deep Neural Network Ensembles [7.883369697332076]
We argue that neither the sampling nor the weighting in a Bayes ensemble are particularly well-suited for increasing generalization performance.
We show that state-of-the-art Bayes ensembles from the literature, despite being computationally demanding, do not improve over simple uniformly weighted deep ensembles.
arXiv Detail & Related papers (2024-06-08T13:19:18Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Score-based generative models are provably robust: an uncertainty quantification perspective [4.396860522241307]
We show that score-based generative models (SGMs) are provably robust to the multiple sources of error in practical implementation.
Our primary tool is the Wasserstein uncertainty propagation (WUP) theorem.
We show how errors due to (a) finite sample approximation, (b) early stopping, (c) score-matching objective choice, (d) score function parametrization, and (e) reference distribution choice, impact the quality of the generative model.
arXiv Detail & Related papers (2024-05-24T17:50:17Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Sequential prediction under log-loss and misspecification [47.66467420098395]
We consider the question of sequential prediction under the log-loss in terms of cumulative regret.
We show that cumulative regrets in the well-specified and misspecified cases coincideally.
We provide an $o(1)$ characterization of the distribution-free or PAC regret.
arXiv Detail & Related papers (2021-01-29T20:28:23Z) - PAC-Bayes Analysis Beyond the Usual Bounds [16.76187007910588]
We focus on a learning model where the learner observes a finite set of training examples.
The learned data-dependent distribution is then used to make randomized predictions.
arXiv Detail & Related papers (2020-06-23T14:30:24Z) - De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and
Non-smooth Predictors [21.59277717031637]
We present a family of de-randomized PACes for deterministic non-smooth predictors, e.g., ReLU-nets.
We also present empirical results of our bounds over changing set size and in labels.
arXiv Detail & Related papers (2020-02-23T17:54:07Z) - Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples.
A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set.
We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.