Accelerated Linearized Laplace Approximation for Bayesian Deep Learning
- URL: http://arxiv.org/abs/2210.12642v1
- Date: Sun, 23 Oct 2022 07:49:03 GMT
- Title: Accelerated Linearized Laplace Approximation for Bayesian Deep Learning
- Authors: Zhijie Deng, Feng Zhou, Jun Zhu
- Abstract summary: We develop a Nystrom approximation to neural tangent kernels (NTKs) to accelerate LLA.
Our method benefits from the capability of popular deep learning libraries for forward mode automatic differentiation.
Our method can even scale up to architectures like vision transformers.
- Score: 34.81292720605279
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Laplace approximation (LA) and its linearized variant (LLA) enable effortless
adaptation of pretrained deep neural networks to Bayesian neural networks. The
generalized Gauss-Newton (GGN) approximation is typically introduced to improve
their tractability. However, LA and LLA are still confronted with non-trivial
inefficiency issues and should rely on Kronecker-factored, diagonal, or even
last-layer approximate GGN matrices in practical use. These approximations are
likely to harm the fidelity of learning outcomes. To tackle this issue,
inspired by the connections between LLA and neural tangent kernels (NTKs), we
develop a Nystrom approximation to NTKs to accelerate LLA. Our method benefits
from the capability of popular deep learning libraries for forward mode
automatic differentiation, and enjoys reassuring theoretical guarantees.
Extensive studies reflect the merits of the proposed method in aspects of both
scalability and performance. Our method can even scale up to architectures like
vision transformers. We also offer valuable ablation studies to diagnose our
method. Code is available at \url{https://github.com/thudzj/ELLA}.
Related papers
- Optimizing Curvature Learning for Robust Hyperbolic Deep Learning in Computer Vision [3.3964154468907486]
We introduce an improved schema for popular learning algorithms and a novel normalization approach to constrain embeddings within the variable representative radius of the manifold.
Our approach demonstrates consistent performance improvements across both direct classification and hierarchical metric learning tasks while allowing for larger hyperbolic models.
arXiv Detail & Related papers (2024-05-22T20:30:14Z) - Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks [0.5827521884806072]
Large neural networks trained on large datasets have become the dominant paradigm in machine learning.
This thesis develops scalable methods to equip neural networks with model uncertainty.
arXiv Detail & Related papers (2024-04-29T23:38:58Z) - Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training [30.452060061499523]
We introduce an approximation technique for the likelihood ratio (LR) method to alleviate computational and memory demands in gradient estimation.
Experiments demonstrate the effectiveness of the approximation technique in neural network training.
arXiv Detail & Related papers (2024-03-18T23:23:50Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Analytically Tractable Bayesian Deep Q-Learning [0.0]
We adapt the temporal difference Q-learning framework to make it compatible with the tractable approximate Gaussian inference (TAGI)
We demonstrate that TAGI can reach a performance comparable to backpropagation-trained networks.
arXiv Detail & Related papers (2021-06-21T13:11:52Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.