Exploring the Uncertainty Properties of Neural Networks' Implicit Priors
in the Infinite-Width Limit
- URL: http://arxiv.org/abs/2010.07355v1
- Date: Wed, 14 Oct 2020 18:41:54 GMT
- Title: Exploring the Uncertainty Properties of Neural Networks' Implicit Priors
in the Infinite-Width Limit
- Authors: Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, and Jasper
Snoek
- Abstract summary: We use recent theoretical advances that characterize the function-space prior to an ensemble of infinitely-wide NNs as a Gaussian process.
This gives us a better understanding of the implicit prior NNs place on function space.
We also examine the calibration of previous approaches to classification with the NNGP.
- Score: 47.324627920761685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern deep learning models have achieved great success in predictive
accuracy for many data modalities. However, their application to many
real-world tasks is restricted by poor uncertainty estimates, such as
overconfidence on out-of-distribution (OOD) data and ungraceful failing under
distributional shift. Previous benchmarks have found that ensembles of neural
networks (NNs) are typically the best calibrated models on OOD data. Inspired
by this, we leverage recent theoretical advances that characterize the
function-space prior of an ensemble of infinitely-wide NNs as a Gaussian
process, termed the neural network Gaussian process (NNGP). We use the NNGP
with a softmax link function to build a probabilistic model for multi-class
classification and marginalize over the latent Gaussian outputs to sample from
the posterior. This gives us a better understanding of the implicit prior NNs
place on function space and allows a direct comparison of the calibration of
the NNGP and its finite-width analogue. We also examine the calibration of
previous approaches to classification with the NNGP, which treat classification
problems as regression to the one-hot labels. In this case the Bayesian
posterior is exact, and we compare several heuristics to generate a categorical
distribution over classes. We find these methods are well calibrated under
distributional shift. Finally, we consider an infinite-width final layer in
conjunction with a pre-trained embedding. This replicates the important
practical use case of transfer learning and allows scaling to significantly
larger datasets. As well as achieving competitive predictive accuracy, this
approach is better calibrated than its finite width analogue.
Related papers
- Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - Bayesian Neural Networks with Domain Knowledge Priors [52.80929437592308]
We propose a framework for integrating general forms of domain knowledge into a BNN prior.
We show that BNNs using our proposed domain knowledge priors outperform those with standard priors.
arXiv Detail & Related papers (2024-02-20T22:34:53Z) - Sparsifying Bayesian neural networks with latent binary variables and
normalizing flows [10.865434331546126]
We will consider two extensions to the latent binary Bayesian neural networks (LBBNN) method.
Firstly, by using the local reparametrization trick (LRT) to sample the hidden units directly, we get a more computationally efficient algorithm.
More importantly, by using normalizing flows on the variational posterior distribution of the LBBNN parameters, the network learns a more flexible variational posterior distribution than the mean field Gaussian.
arXiv Detail & Related papers (2023-05-05T09:40:28Z) - Improved uncertainty quantification for neural networks with Bayesian
last layer [0.0]
Uncertainty quantification is an important task in machine learning.
We present a reformulation of the log-marginal likelihood of a NN with BLL which allows for efficient training using backpropagation.
arXiv Detail & Related papers (2023-02-21T20:23:56Z) - Constraining cosmological parameters from N-body simulations with
Variational Bayesian Neural Networks [0.0]
Multiplicative normalizing flows (MNFs) are a family of approximate posteriors for the parameters of BNNs.
We have compared MNFs with respect to the standard BNNs, and the flipout estimator.
MNFs provide more realistic predictive distribution closer to the true posterior mitigating the bias introduced by the variational approximation.
arXiv Detail & Related papers (2023-01-09T16:07:48Z) - Bayesian Neural Network Versus Ex-Post Calibration For Prediction
Uncertainty [0.2343856409260935]
Probabilistic predictions from neural networks account for predictive uncertainty during classification.
In practice most datasets are trained on non-probabilistic neural networks which by default do not capture this inherent uncertainty.
A plausible alternative to the calibration approach is to use Bayesian neural networks, which directly models a predictive distribution.
arXiv Detail & Related papers (2022-09-29T07:22:19Z) - A Simple Approach to Improve Single-Model Deep Uncertainty via
Distance-Awareness [33.09831377640498]
We study approaches to improve uncertainty property of a single network, based on a single, deterministic representation.
We propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs.
On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection.
arXiv Detail & Related papers (2022-05-01T05:46:13Z) - A Biased Graph Neural Network Sampler with Near-Optimal Regret [57.70126763759996]
Graph neural networks (GNN) have emerged as a vehicle for applying deep network architectures to graph and relational data.
In this paper, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem.
We introduce a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded payouts.
arXiv Detail & Related papers (2021-03-01T15:55:58Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.