Bayesian Deep Learning and a Probabilistic Perspective of Generalization
- URL: http://arxiv.org/abs/2002.08791v4
- Date: Wed, 30 Mar 2022 17:22:29 GMT
- Title: Bayesian Deep Learning and a Probabilistic Perspective of Generalization
- Authors: Andrew Gordon Wilson, Pavel Izmailov
- Abstract summary: We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
- Score: 56.69671152009899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The key distinguishing property of a Bayesian approach is marginalization,
rather than using a single setting of weights. Bayesian marginalization can
particularly improve the accuracy and calibration of modern deep neural
networks, which are typically underspecified by the data, and can represent
many compelling but different solutions. We show that deep ensembles provide an
effective mechanism for approximate Bayesian marginalization, and propose a
related approach that further improves the predictive distribution by
marginalizing within basins of attraction, without significant overhead. We
also investigate the prior over functions implied by a vague distribution over
neural network weights, explaining the generalization properties of such models
from a probabilistic perspective. From this perspective, we explain results
that have been presented as mysterious and distinct to neural network
generalization, such as the ability to fit images with random labels, and show
that these results can be reproduced with Gaussian processes. We also show that
Bayesian model averaging alleviates double descent, resulting in monotonic
performance improvements with increased flexibility. Finally, we provide a
Bayesian perspective on tempering for calibrating predictive distributions.
Related papers
- Implicit Variational Inference for High-Dimensional Posteriors [7.924706533725115]
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution.
We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors.
Our approach introduces novel bounds for approximate inference using implicit distributions by locally linearising the neural sampler.
arXiv Detail & Related papers (2023-10-10T14:06:56Z) - A Heavy-Tailed Algebra for Probabilistic Programming [53.32246823168763]
We propose a systematic approach for analyzing the tails of random variables.
We show how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler.
Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
arXiv Detail & Related papers (2023-06-15T16:37:36Z) - Bayesian inference with finitely wide neural networks [0.4568777157687961]
We propose a non-Gaussian distribution in differential form to model a finite set of outputs from a random neural network.
We are able to derive the non-Gaussian posterior distribution in Bayesian regression task.
arXiv Detail & Related papers (2023-03-06T03:25:30Z) - Flat Seeking Bayesian Neural Networks [32.61417343756841]
We develop theories, the Bayesian setting, and the variational inference approach for the sharpness-aware posterior.
Specifically, the models sampled from our sharpness-aware posterior, and the optimal approximate posterior estimating this sharpness-aware posterior, have better flatness.
We conduct experiments by leveraging the sharpness-aware posterior with state-of-the-art Bayesian Neural Networks.
arXiv Detail & Related papers (2023-02-06T11:40:44Z) - Robust Gaussian Process Regression with Huber Likelihood [2.7184224088243365]
We propose a robust process model in the Gaussian process framework with the likelihood of observed data expressed as the Huber probability distribution.
The proposed model employs weights based on projection statistics to scale residuals and bound the influence of vertical outliers and bad leverage points on the latent functions estimates.
arXiv Detail & Related papers (2023-01-19T02:59:33Z) - Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks.
This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights.
We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - The Case for Bayesian Deep Learning [41.54360061376725]
Key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule.
Recent advances for Bayesian deep learning provide improvements in accuracy and calibration compared to standard training.
arXiv Detail & Related papers (2020-01-29T18:08:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.