Related papers: Robust PAC$^m$: Training Ensemble Models Under Misspecification and Outliers

Robust PAC$^m$: Training Ensemble Models Under Misspecification and Outliers

URL: http://arxiv.org/abs/2203.01859v3
Date: Sun, 23 Apr 2023 15:12:44 GMT
Title: Robust PAC$^m$: Training Ensemble Models Under Misspecification and Outliers
Authors: Matteo Zecchin, Sangwoo Park, Osvaldo Simeone, Marios Kountouris, David Gesbert
Abstract summary: PAC-Bayes theory demonstrates that the free energy criterion minimized by Bayesian learning is a bound on the generalization error for Gibbs predictors. This work presents a novel robust free energy criterion that combines the generalized score function with PAC$m$ ensemble bounds.
Score: 46.38465729190199
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Standard Bayesian learning is known to have suboptimal generalization capabilities under misspecification and in the presence of outliers. PAC-Bayes theory demonstrates that the free energy criterion minimized by Bayesian learning is a bound on the generalization error for Gibbs predictors (i.e., for single models drawn at random from the posterior) under the assumption of sampling distributions uncontaminated by outliers. This viewpoint provides a justification for the limitations of Bayesian learning when the model is misspecified, requiring ensembling, and when data is affected by outliers. In recent work, PAC-Bayes bounds -- referred to as PAC$^m$ -- were derived to introduce free energy metrics that account for the performance of ensemble predictors, obtaining enhanced performance under misspecification. This work presents a novel robust free energy criterion that combines the generalized logarithm score function with PAC$^m$ ensemble bounds. The proposed free energy training criterion produces predictive distributions that are able to concurrently counteract the detrimental effects of misspecification -- with respect to both likelihood and prior distribution -- and outliers.

Related papers

Epistemic Uncertainty in Conformal Scores: A Unified Approach [2.449909275410288]
Conformal prediction methods create prediction bands with distribution-free guarantees but do not explicitly capture uncertainty. We introduce $texttEPICSCORE$, a model-agnostic approach that enhances any conformal score by explicitly integrating uncertainty. $texttEPICSCORE$ adaptively expands predictive intervals in regions with limited data while maintaining compact intervals where data is abundant.
arXiv Detail & Related papers (2025-02-10T19:42:54Z)
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers. We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions. This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z)
Bayesian vs. PAC-Bayesian Deep Neural Network Ensembles [7.883369697332076]
We argue that neither the sampling nor the weighting in a Bayes ensemble are particularly well-suited for increasing generalization performance. We show that state-of-the-art Bayes ensembles from the literature, despite being computationally demanding, do not improve over simple uniformly weighted deep ensembles.
arXiv Detail & Related papers (2024-06-08T13:19:18Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Robust Generative Learning with Lipschitz-Regularized $α$-Divergences Allows Minimal Assumptions on Target Distributions [12.19634962193403]
This paper demonstrates the robustness of Lipschitz-regularized $alpha$-divergences as objective functionals in generative modeling. We prove the existence and finiteness of their variational derivatives, which are essential for stable training of generative models such as GANs and gradient flows. Numerical experiments confirm that generative models leveraging Lipschitz-regularized $alpha$-divergences can stably learn distributions in various challenging scenarios.
arXiv Detail & Related papers (2024-05-22T19:58:13Z)
Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z)
Do PAC-Learners Learn the Marginal Distribution? [19.54058590042626]
The Fundamental Theorem of PAC Learning asserts that learnability of a concept class $H$ is equivalent to the $textituniform convergence$ of empirical error in $H$.<n>This work revisits the connection between PAC learning, uniform convergence, and density estimation beyond the distribution-free setting.
arXiv Detail & Related papers (2023-02-13T11:42:58Z)
Sequential prediction under log-loss and misspecification [47.66467420098395]
We consider the question of sequential prediction under the log-loss in terms of cumulative regret. We show that cumulative regrets in the well-specified and misspecified cases coincideally. We provide an $o(1)$ characterization of the distribution-free or PAC regret.
arXiv Detail & Related papers (2021-01-29T20:28:23Z)
PAC-Bayes Analysis Beyond the Usual Bounds [16.76187007910588]
We focus on a learning model where the learner observes a finite set of training examples. The learned data-dependent distribution is then used to make randomized predictions.
arXiv Detail & Related papers (2020-06-23T14:30:24Z)
De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Predictors [21.59277717031637]
We present a family of de-randomized PACes for deterministic non-smooth predictors, e.g., ReLU-nets. We also present empirical results of our bounds over changing set size and in labels.
arXiv Detail & Related papers (2020-02-23T17:54:07Z)
Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples. A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set. We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.