Bayesian Deep Learning via Subnetwork Inference
- URL: http://arxiv.org/abs/2010.14689v4
- Date: Mon, 14 Mar 2022 13:46:12 GMT
- Title: Bayesian Deep Learning via Subnetwork Inference
- Authors: Erik Daxberger, Eric Nalisnick, James Urquhart Allingham, Javier
Antor\'an, Jos\'e Miguel Hern\'andez-Lobato
- Abstract summary: We show that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors.
This subnetwork inference framework enables us to use expressive, otherwise intractable, posterior approximations over such subsets.
- Score: 2.2835610890984164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Bayesian paradigm has the potential to solve core issues of deep neural
networks such as poor calibration and data inefficiency. Alas, scaling Bayesian
inference to large weight spaces often requires restrictive approximations. In
this work, we show that it suffices to perform inference over a small subset of
model weights in order to obtain accurate predictive posteriors. The other
weights are kept as point estimates. This subnetwork inference framework
enables us to use expressive, otherwise intractable, posterior approximations
over such subsets. In particular, we implement subnetwork linearized Laplace as
a simple, scalable Bayesian deep learning method: We first obtain a MAP
estimate of all weights and then infer a full-covariance Gaussian posterior
over a subnetwork using the linearized Laplace approximation. We propose a
subnetwork selection strategy that aims to maximally preserve the model's
predictive uncertainty. Empirically, our approach compares favorably to
ensembles and less expressive posterior approximations over full networks. Our
proposed subnetwork (linearized) Laplace method is implemented within the
laplace PyTorch library at https://github.com/AlexImmer/Laplace.
Related papers
- Posterior and variational inference for deep neural networks with heavy-tailed weights [0.0]
We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random.
We show that the corresponding posterior distribution achieves near-optimal minimax contraction rates.
We also provide variational Bayes counterparts of the results, that show that mean-field variational approximations still benefit from near-optimal theoretical support.
arXiv Detail & Related papers (2024-06-05T15:24:20Z) - Hessian-Free Laplace in Bayesian Deep Learning [44.16006844888796]
Hessian-free Laplace (HFL) approximation uses curvature of both the log posterior and network prediction to estimate its variance.
We show that, under standard assumptions of LA in Bayesian deep learning, HFL targets the same variance as LA, and can be efficiently amortized in a pre-trained network.
arXiv Detail & Related papers (2024-03-15T20:47:39Z) - Improved uncertainty quantification for neural networks with Bayesian
last layer [0.0]
Uncertainty quantification is an important task in machine learning.
We present a reformulation of the log-marginal likelihood of a NN with BLL which allows for efficient training using backpropagation.
arXiv Detail & Related papers (2023-02-21T20:23:56Z) - Content Popularity Prediction Based on Quantized Federated Bayesian
Learning in Fog Radio Access Networks [76.16527095195893]
We investigate the content popularity prediction problem in cache-enabled fog radio access networks (F-RANs)
In order to predict the content popularity with high accuracy and low complexity, we propose a Gaussian process based regressor to model the content request pattern.
We utilize Bayesian learning to train the model parameters, which is robust to overfitting.
arXiv Detail & Related papers (2022-06-23T03:05:12Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - The Bayesian Method of Tensor Networks [1.7894377200944511]
We study the Bayesian framework of the Network from two perspective.
We study the Bayesian properties of the Network by visualizing the parameters of the model and the decision boundaries in the two dimensional synthetic data set.
arXiv Detail & Related papers (2021-01-01T14:59:15Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning.
We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly.
The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z) - Fast Predictive Uncertainty for Classification with Bayesian Deep
Networks [25.821401066200504]
In Bayesian Deep Learning, distributions over the output of classification neural networks are approximated by first constructing a Gaussian distribution over the weights, then sampling from it to receive a distribution over the softmax outputs.
We construct a Dirichlet approximation of this softmax output distribution, which yields an analytic map between Gaussian distributions in logit space and Dirichlet distributions in the output space.
We demonstrate that the resulting Dirichlet distribution has multiple advantages, in particular, more efficient of the uncertainty estimate and scaling to large datasets and networks like ImageNet and DenseNet.
arXiv Detail & Related papers (2020-03-02T22:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.