Riemannian Laplace approximations for Bayesian neural networks
- URL: http://arxiv.org/abs/2306.07158v1
- Date: Mon, 12 Jun 2023 14:44:22 GMT
- Title: Riemannian Laplace approximations for Bayesian neural networks
- Authors: Federico Bergamin, Pablo Moreno-Mu\~noz, S{\o}ren Hauberg, Georgios
Arvanitidis
- Abstract summary: We propose a simple parametric approximate posterior that adapts to the shape of the true posterior.
We show that our approach consistently improves over the conventional Laplace approximation across tasks.
- Score: 3.6990978741464904
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bayesian neural networks often approximate the weight-posterior with a
Gaussian distribution. However, practical posteriors are often, even locally,
highly non-Gaussian, and empirical performance deteriorates. We propose a
simple parametric approximate posterior that adapts to the shape of the true
posterior through a Riemannian metric that is determined by the log-posterior
gradient. We develop a Riemannian Laplace approximation where samples naturally
fall into weight-regions with low negative log-posterior. We show that these
samples can be drawn by solving a system of ordinary differential equations,
which can be done efficiently by leveraging the structure of the Riemannian
metric and automatic differentiation. Empirically, we demonstrate that our
approach consistently improves over the conventional Laplace approximation
across tasks. We further show that, unlike the conventional Laplace
approximation, our method is not overly sensitive to the choice of prior, which
alleviates a practical pitfall of current approaches.
Related papers
- Online Posterior Sampling with a Diffusion Prior [20.24212000441531]
Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation.
In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior.
arXiv Detail & Related papers (2024-10-04T20:47:16Z) - Generalized Laplace Approximation [23.185126261153236]
We introduce a unified theoretical framework to attribute Bayesian inconsistency to model misspecification and inadequate priors.
We propose the generalized Laplace approximation, which involves a simple adjustment to the Hessian matrix of the regularized loss function.
We assess the performance and properties of the generalized Laplace approximation on state-of-the-art neural networks and real-world datasets.
arXiv Detail & Related papers (2024-05-22T11:11:42Z) - Riemannian Laplace Approximation with the Fisher Metric [5.982697037000189]
Laplace's method approximates a target density with a Gaussian distribution at its mode.
For complex targets and finite-data posteriors it is often too crude an approximation.
We develop two alternative variants that are exact at the limit of infinite data.
arXiv Detail & Related papers (2023-11-05T20:51:03Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Scalable Stochastic Gradient Riemannian Langevin Dynamics in Non-Diagonal Metrics [3.8811062755861956]
We propose two non-diagonal metrics that can be used in-gradient samplers to improve convergence and exploration.
We show that for fully connected neural networks (NNs) with sparsity-inducing priors and convolutional NNs with correlated priors, using these metrics can provide improvements.
arXiv Detail & Related papers (2023-03-09T08:20:28Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations.
This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector.
We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z) - Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning.
We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly.
The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z) - Mean-Field Approximation to Gaussian-Softmax Integral with Application
to Uncertainty Estimation [23.38076756988258]
We propose a new single-model based approach to quantify uncertainty in deep neural networks.
We use a mean-field approximation formula to compute an analytically intractable integral.
Empirically, the proposed approach performs competitively when compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-06-13T07:32:38Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.