Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks
- URL: http://arxiv.org/abs/2007.11994v1
- Date: Tue, 21 Jul 2020 17:42:58 GMT
- Title: Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks
- Authors: Alexander Immer
- Abstract summary: We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning.
We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly.
The connection to Gaussian processes enables new function-space inference algorithms.
- Score: 96.87076679064499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this thesis, we disentangle the generalized Gauss-Newton and approximate
inference for Bayesian deep learning. The generalized Gauss-Newton method is an
optimization method that is used in several popular Bayesian deep learning
algorithms. Algorithms that combine the Gauss-Newton method with the Laplace
and Gaussian variational approximation have recently led to state-of-the-art
results in Bayesian deep learning. While the Laplace and Gaussian variational
approximation have been studied extensively, their interplay with the
Gauss-Newton method remains unclear. Recent criticism of priors and posterior
approximations in Bayesian deep learning further urges the need for a deeper
understanding of practical algorithms. The individual analysis of the
Gauss-Newton method and Laplace and Gaussian variational approximations for
neural networks provides both theoretical insight and new practical algorithms.
We find that the Gauss-Newton method simplifies the underlying probabilistic
model significantly. In particular, the combination of the Gauss-Newton method
with approximate inference can be cast as inference in a linear or Gaussian
process model. The Laplace and Gaussian variational approximation can
subsequently provide a posterior approximation to these simplified models. This
new disentangled understanding of recent Bayesian deep learning algorithms also
leads to new methods: first, the connection to Gaussian processes enables new
function-space inference algorithms. Second, we present a marginal likelihood
approximation of the underlying probabilistic model to tune neural network
hyperparameters. Finally, the identified underlying models lead to different
methods to compute predictive distributions. In fact, we find that these
prediction methods for Bayesian neural networks often work better than the
default choice and solve a common issue with the Laplace approximation.
Related papers
- Likelihood approximations via Gaussian approximate inference [3.4991031406102238]
We propose efficient schemes to approximate the effects of non-Gaussian likelihoods by Gaussian densities.
Our results attain good approximation quality for binary and multiclass classification in large-scale point-estimate and distributional inferential settings.
As a by-product, we show that the proposed approximate log-likelihoods are a superior alternative to least-squares on raw labels for neural network classification.
arXiv Detail & Related papers (2024-10-28T05:39:26Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Modified Gauss-Newton Algorithms under Noise [2.0454959820861727]
modified Gauss-Newton or proxlinear algorithms can lead to contrasting outcomes when compared to gradient descent in large-scale statistical settings.
We explore the contrasting performance of these two classes of algorithms in theory on a stylized statistical example, and experimentally on learning problems including structured prediction.
arXiv Detail & Related papers (2023-05-18T01:10:42Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Gaussian Processes and Statistical Decision-making in Non-Euclidean
Spaces [96.53463532832939]
We develop techniques for broadening the applicability of Gaussian processes.
We introduce a wide class of efficient approximations built from this viewpoint.
We develop a collection of Gaussian process models over non-Euclidean spaces.
arXiv Detail & Related papers (2022-02-22T01:42:57Z) - Bayes-Newton Methods for Approximate Bayesian Inference with PSD
Guarantees [18.419390913544504]
This viewpoint explicitly casts inference algorithms under the framework of numerical optimisation.
We show that common approximations to Newton's method from the optimisation literature are still valid under this 'Bayes-Newton' framework.
Our unifying viewpoint provides new insights into the connections between various inference schemes.
arXiv Detail & Related papers (2021-11-02T16:39:29Z) - Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations.
This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector.
We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Mean-Field Approximation to Gaussian-Softmax Integral with Application
to Uncertainty Estimation [23.38076756988258]
We propose a new single-model based approach to quantify uncertainty in deep neural networks.
We use a mean-field approximation formula to compute an analytically intractable integral.
Empirically, the proposed approach performs competitively when compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-06-13T07:32:38Z) - Sparse Orthogonal Variational Inference for Gaussian Processes [34.476453597078894]
We introduce a new interpretation of sparse variational approximations for Gaussian processes using inducing points.
We show that this formulation recovers existing approximations and at the same time allows to obtain tighter lower bounds on the marginal likelihood and new variational inference algorithms.
arXiv Detail & Related papers (2019-10-23T15:01:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.