On the Robustness to Misspecification of $\alpha$-Posteriors and Their
Variational Approximations
- URL: http://arxiv.org/abs/2104.08324v1
- Date: Fri, 16 Apr 2021 19:11:53 GMT
- Title: On the Robustness to Misspecification of $\alpha$-Posteriors and Their
Variational Approximations
- Authors: Marco Avella Medina and Jos\'e Luis Montiel Olea and Cynthia Rush and
Amilcar Velez
- Abstract summary: $alpha$-posteriors and their variational approximations distort standard posterior inference by downweighting the likelihood and introducing variational approximation errors.
We show that such distortions, if tuned appropriately, reduce the Kullback-Leibler (KL) divergence from the true, but perhaps infeasible, posterior distribution when there is potential parametric model misspecification.
- Score: 12.52149409594807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: $\alpha$-posteriors and their variational approximations distort standard
posterior inference by downweighting the likelihood and introducing variational
approximation errors. We show that such distortions, if tuned appropriately,
reduce the Kullback-Leibler (KL) divergence from the true, but perhaps
infeasible, posterior distribution when there is potential parametric model
misspecification. To make this point, we derive a Bernstein-von Mises theorem
showing convergence in total variation distance of $\alpha$-posteriors and
their variational approximations to limiting Gaussian distributions. We use
these distributions to evaluate the KL divergence between true and reported
posteriors. We show this divergence is minimized by choosing $\alpha$ strictly
smaller than one, assuming there is a vanishingly small probability of model
misspecification. The optimized value becomes smaller as the the
misspecification becomes more severe. The optimized KL divergence increases
logarithmically in the degree of misspecification and not linearly as with the
usual posterior.
Related papers
- Predictive variational inference: Learn the predictively optimal posterior distribution [1.7648680700685022]
Vanilla variational inference finds an optimal approximation to the Bayesian posterior distribution, but even the exact Bayesian posterior is often not meaningful under model misspecification.
We propose predictive variational inference (PVI): a general inference framework that seeks and samples from an optimal posterior density.
This framework applies to both likelihood-exact and likelihood-free models.
arXiv Detail & Related papers (2024-10-18T19:44:57Z) - Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)
We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - Variance Reduction for the Independent Metropolis Sampler [11.074080383657453]
We prove that if $pi$ is close enough under KL divergence to another density $q$, an independent sampler that obtains samples from $pi$ achieves smaller variance than i.i.d. sampling from $pi$.
We propose an adaptive independent Metropolis algorithm that adapts the proposal density such that its KL divergence with the target is being reduced.
arXiv Detail & Related papers (2024-06-25T16:38:53Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Variational Prediction [95.00085314353436]
We present a technique for learning a variational approximation to the posterior predictive distribution using a variational bound.
This approach can provide good predictive distributions without test time marginalization costs.
arXiv Detail & Related papers (2023-07-14T18:19:31Z) - Function-space regularized R\'enyi divergences [6.221019624345409]
We propose a new family of regularized R'enyi divergences parametrized by a variational function space.
We prove several properties of these new divergences, showing that they interpolate between the classical R'enyi divergences and IPMs.
We show that the proposed regularized R'enyi divergences inherit features from IPMs such as the ability to compare distributions that are not absolutely continuous.
arXiv Detail & Related papers (2022-10-10T19:18:04Z) - On the detrimental effect of invariances in the likelihood for
variational inference [21.912271882110986]
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability.
Prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes.
arXiv Detail & Related papers (2022-09-15T09:13:30Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - On Misspecification in Prediction Problems and Robustness via Improper
Learning [23.64462813525688]
We show that for a broad class of loss functions and parametric families of distributions, the regret of playing a "proper" predictor has lower bound scaling at least as $sqrtgamma n$.
We exhibit instances in which this is unimprovable even over the family of all learners that may play distributions in the convex hull of the parametric family.
arXiv Detail & Related papers (2021-01-13T17:54:08Z) - Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains.
One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z) - The k-tied Normal Distribution: A Compact Parameterization of Gaussian
Mean Field Posteriors in Bayesian Neural Networks [46.677567663908185]
Variational Bayesian Inference is a popular methodology for approxing posteriorimating over Bayesian neural network weights.
Recent work has explored ever richer parameterizations of the approximate posterior in the hope of improving performance.
We find that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance.
arXiv Detail & Related papers (2020-02-07T07:33:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.