Related papers: Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit

Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit

URL: http://arxiv.org/abs/2010.07355v1
Date: Wed, 14 Oct 2020 18:41:54 GMT
Title: Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit
Authors: Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, and Jasper Snoek
Abstract summary: We use recent theoretical advances that characterize the function-space prior to an ensemble of infinitely-wide NNs as a Gaussian process. This gives us a better understanding of the implicit prior NNs place on function space. We also examine the calibration of previous approaches to classification with the NNGP.
Score: 47.324627920761685
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern deep learning models have achieved great success in predictive accuracy for many data modalities. However, their application to many real-world tasks is restricted by poor uncertainty estimates, such as overconfidence on out-of-distribution (OOD) data and ungraceful failing under distributional shift. Previous benchmarks have found that ensembles of neural networks (NNs) are typically the best calibrated models on OOD data. Inspired by this, we leverage recent theoretical advances that characterize the function-space prior of an ensemble of infinitely-wide NNs as a Gaussian process, termed the neural network Gaussian process (NNGP). We use the NNGP with a softmax link function to build a probabilistic model for multi-class classification and marginalize over the latent Gaussian outputs to sample from the posterior. This gives us a better understanding of the implicit prior NNs place on function space and allows a direct comparison of the calibration of the NNGP and its finite-width analogue. We also examine the calibration of previous approaches to classification with the NNGP, which treat classification problems as regression to the one-hot labels. In this case the Bayesian posterior is exact, and we compare several heuristics to generate a categorical distribution over classes. We find these methods are well calibrated under distributional shift. Finally, we consider an infinite-width final layer in conjunction with a pre-trained embedding. This replicates the important practical use case of transfer learning and allows scaling to significantly larger datasets. As well as achieving competitive predictive accuracy, this approach is better calibrated than its finite width analogue.

Related papers

Uncertainty Quantification for Large-Scale Deep Networks via Post-StoNet Modeling [10.158931392545618]
We introduce a novel post-processing approach to quantify uncertainty of predictions from deep neural networks (DNNs)<n>This approach feeds the output from a pre-trained large-scale model into a neural network parameter (StoNet)<n>We show that the proposed approach can construct honest confidence intervals with shorter interval lengths compared to conformal methods.
arXiv Detail & Related papers (2025-08-02T06:19:23Z)
Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP) For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z)
Bayesian Neural Networks with Domain Knowledge Priors [52.80929437592308]
We propose a framework for integrating general forms of domain knowledge into a BNN prior. We show that BNNs using our proposed domain knowledge priors outperform those with standard priors.
arXiv Detail & Related papers (2024-02-20T22:34:53Z)
Sparsifying Bayesian neural networks with latent binary variables and normalizing flows [10.865434331546126]
We will consider two extensions to the latent binary Bayesian neural networks (LBBNN) method. Firstly, by using the local reparametrization trick (LRT) to sample the hidden units directly, we get a more computationally efficient algorithm. More importantly, by using normalizing flows on the variational posterior distribution of the LBBNN parameters, the network learns a more flexible variational posterior distribution than the mean field Gaussian.
arXiv Detail & Related papers (2023-05-05T09:40:28Z)
Improved uncertainty quantification for neural networks with Bayesian last layer [0.0]
Uncertainty quantification is an important task in machine learning. We present a reformulation of the log-marginal likelihood of a NN with BLL which allows for efficient training using backpropagation.
arXiv Detail & Related papers (2023-02-21T20:23:56Z)
Constraining cosmological parameters from N-body simulations with Variational Bayesian Neural Networks [0.0]
Multiplicative normalizing flows (MNFs) are a family of approximate posteriors for the parameters of BNNs. We have compared MNFs with respect to the standard BNNs, and the flipout estimator. MNFs provide more realistic predictive distribution closer to the true posterior mitigating the bias introduced by the variational approximation.
arXiv Detail & Related papers (2023-01-09T16:07:48Z)
Bayesian Neural Network Versus Ex-Post Calibration For Prediction Uncertainty [0.2343856409260935]
Probabilistic predictions from neural networks account for predictive uncertainty during classification. In practice most datasets are trained on non-probabilistic neural networks which by default do not capture this inherent uncertainty. A plausible alternative to the calibration approach is to use Bayesian neural networks, which directly models a predictive distribution.
arXiv Detail & Related papers (2022-09-29T07:22:19Z)
A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness [33.09831377640498]
We study approaches to improve uncertainty property of a single network, based on a single, deterministic representation. We propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection.
arXiv Detail & Related papers (2022-05-01T05:46:13Z)
A Biased Graph Neural Network Sampler with Near-Optimal Regret [57.70126763759996]
Graph neural networks (GNN) have emerged as a vehicle for applying deep network architectures to graph and relational data. In this paper, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem. We introduce a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded payouts.
arXiv Detail & Related papers (2021-03-01T15:55:58Z)
Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN) Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one. We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.