Bayes-Newton Methods for Approximate Bayesian Inference with PSD
Guarantees
- URL: http://arxiv.org/abs/2111.01721v2
- Date: Wed, 3 Nov 2021 09:34:02 GMT
- Title: Bayes-Newton Methods for Approximate Bayesian Inference with PSD
Guarantees
- Authors: William J. Wilkinson, Simo S\"arkk\"a and Arno Solin
- Abstract summary: This viewpoint explicitly casts inference algorithms under the framework of numerical optimisation.
We show that common approximations to Newton's method from the optimisation literature are still valid under this 'Bayes-Newton' framework.
Our unifying viewpoint provides new insights into the connections between various inference schemes.
- Score: 18.419390913544504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We formulate natural gradient variational inference (VI), expectation
propagation (EP), and posterior linearisation (PL) as extensions of Newton's
method for optimising the parameters of a Bayesian posterior distribution. This
viewpoint explicitly casts inference algorithms under the framework of
numerical optimisation. We show that common approximations to Newton's method
from the optimisation literature, namely Gauss-Newton and quasi-Newton methods
(e.g., the BFGS algorithm), are still valid under this 'Bayes-Newton'
framework. This leads to a suite of novel algorithms which are guaranteed to
result in positive semi-definite covariance matrices, unlike standard VI and
EP. Our unifying viewpoint provides new insights into the connections between
various inference schemes. All the presented methods apply to any model with a
Gaussian prior and non-conjugate likelihood, which we demonstrate with (sparse)
Gaussian processes and state space models.
Related papers
- Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Modified Gauss-Newton Algorithms under Noise [2.0454959820861727]
modified Gauss-Newton or proxlinear algorithms can lead to contrasting outcomes when compared to gradient descent in large-scale statistical settings.
We explore the contrasting performance of these two classes of algorithms in theory on a stylized statistical example, and experimentally on learning problems including structured prediction.
arXiv Detail & Related papers (2023-05-18T01:10:42Z) - Robust empirical risk minimization via Newton's method [9.797319790710711]
A new variant of Newton's method for empirical risk minimization is studied.
The gradient and Hessian of the objective function are replaced by robust estimators.
An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed.
arXiv Detail & Related papers (2023-01-30T18:54:54Z) - Manifold Gaussian Variational Bayes on the Precision Matrix [70.44024861252554]
We propose an optimization algorithm for Variational Inference (VI) in complex models.
We develop an efficient algorithm for Gaussian Variational Inference whose updates satisfy the positive definite constraint on the variational covariance matrix.
Due to its black-box nature, MGVBP stands as a ready-to-use solution for VI in complex models.
arXiv Detail & Related papers (2022-10-26T10:12:31Z) - A Discrete Variational Derivation of Accelerated Methods in Optimization [68.8204255655161]
We introduce variational which allow us to derive different methods for optimization.
We derive two families of optimization methods in one-to-one correspondence.
The preservation of symplecticity of autonomous systems occurs here solely on the fibers.
arXiv Detail & Related papers (2021-06-04T20:21:53Z) - Discriminative Bayesian filtering lends momentum to the stochastic
Newton method for minimizing log-convex functions [0.0]
We show how the Newton method iteratively updates its estimate using subsampled versions of gradient and Hessian versions.
Applying Bayesian filtering, we consider the entire history of observations.
We establish matrix-based conditions under which the effect of older observations diminishes.
We illustrate various aspects of our approach with an example and other innovations for the Newton method.
arXiv Detail & Related papers (2021-04-27T02:39:21Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning.
We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly.
The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z) - Sparse Orthogonal Variational Inference for Gaussian Processes [34.476453597078894]
We introduce a new interpretation of sparse variational approximations for Gaussian processes using inducing points.
We show that this formulation recovers existing approximations and at the same time allows to obtain tighter lower bounds on the marginal likelihood and new variational inference algorithms.
arXiv Detail & Related papers (2019-10-23T15:01:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.