Related papers: BALI: Learning Neural Networks via Bayesian Layerwise Inference

BALI: Learning Neural Networks via Bayesian Layerwise Inference

URL: http://arxiv.org/abs/2411.12102v1
Date: Mon, 18 Nov 2024 22:18:34 GMT
Title: BALI: Learning Neural Networks via Bayesian Layerwise Inference
Authors: Richard Kurle, Alexej Klushyn, Ralf Herbrich,
Abstract summary: We introduce a new method for learning Bayesian neural networks, treating them as a stack of multivariate Bayesian linear regression models. The main idea is to infer the layerwise posterior exactly if we know the target outputs of each layer. We define these pseudo-targets as the layer outputs from the forward pass, updated by the backpropagated of the objective function.
Score: 6.7819070167076045
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a new method for learning Bayesian neural networks, treating them as a stack of multivariate Bayesian linear regression models. The main idea is to infer the layerwise posterior exactly if we know the target outputs of each layer. We define these pseudo-targets as the layer outputs from the forward pass, updated by the backpropagated gradients of the objective function. The resulting layerwise posterior is a matrix-normal distribution with a Kronecker-factorized covariance matrix, which can be efficiently inverted. Our method extends to the stochastic mini-batch setting using an exponential moving average over natural-parameter terms, thus gradually forgetting older data. The method converges in few iterations and performs as well as or better than leading Bayesian neural network methods on various regression, classification, and out-of-distribution detection benchmarks.

Related papers

Variational Bayesian Last Layers [14.521406172240845]
We introduce a deterministic variational formulation for training Bayesian last layer neural networks. This yields a sampling-free, single-pass model and loss that effectively improves uncertainty estimation. We experimentally investigate VBLLs, and show that they improve predictive accuracy, calibration, and out of distribution detection over baselines.
arXiv Detail & Related papers (2024-04-17T17:50:24Z)
Learning to solve Bayesian inverse problems: An amortized variational inference approach using Gaussian and Flow guides [0.0]
We develop a methodology that enables real-time inference by learning the Bayesian inverse map, i.e., the map from data to posteriors. Our approach provides the posterior distribution for a given observation just at the cost of a forward pass of the neural network.
arXiv Detail & Related papers (2023-05-31T16:25:07Z)
WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization. We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer. We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z)
Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two. For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z)
Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation. We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z)
Memory-Efficient Backpropagation through Large Linear Layers [107.20037639738433]
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers.
arXiv Detail & Related papers (2022-01-31T13:02:41Z)
Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs) PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z)
Bayesian Neural Network via Stochastic Gradient Descent [0.0]
We show how gradient estimation can be applied on bayesian neural networks by gradient estimation techniques. Our work considerably beats the previous state of the art approaches for regression using bayesian neural networks.
arXiv Detail & Related papers (2020-06-04T18:33:59Z)
Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes [38.79834570417554]
We develop a correlated approximate posterior over the weights at all layers in a Bayesian neural network. We extend this approach to deep Gaussian processes, unifying inference in the two model classes.
arXiv Detail & Related papers (2020-05-17T01:10:37Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.