Related papers: Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks

Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks

URL: http://arxiv.org/abs/2406.04317v2
Date: Fri, 19 Jul 2024 15:19:55 GMT
Title: Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks
Authors: Tristan Cinquin, Robert Bamler,
Abstract summary: posterior uncertainty estimates depend on the choice of prior, and finding informative priors in weight-space has proven difficult. This has motivated variational inference (VI) methods that pose priors directly on the function generated by the BNN rather than on weights. We show that our method incorporates the properties specified by the GP prior on synthetic and small real-world data sets.
Score: 9.023847175654604
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Bayesian neural networks (BNN) promise to combine the predictive performance of neural networks with principled uncertainty modeling important for safety-critical systems and decision making. However, posterior uncertainty estimates depend on the choice of prior, and finding informative priors in weight-space has proven difficult. This has motivated variational inference (VI) methods that pose priors directly on the function generated by the BNN rather than on weights. In this paper, we address a fundamental issue with such function-space VI approaches pointed out by Burt et al. (2020), who showed that the objective function (ELBO) is negative infinite for most priors of interest. Our solution builds on generalized VI (Knoblauch et al., 2019) with the regularized KL divergence (Quang, 2019) and is, to the best of our knowledge, the first well-defined variational objective for function-space inference in BNNs with Gaussian process (GP) priors. Experiments show that our method incorporates the properties specified by the GP prior on synthetic and small real-world data sets, and provides competitive uncertainty estimates for regression, classification and out-of-distribution detection compared to BNN baselines with both function and weight-space priors.

Related papers

Feature Preserving Shrinkage on Bayesian Neural Networks via the R2D2 Prior [22.218522445858344]
Bayesian neural networks (BNNs) treat neural network weights as random variables, which aim to provide posterior uncertainty estimates.<n>We propose a novel R2D2-Net, which imposes the R2-induced Dirichlet Decomposition (R2D2) prior to the BNN weights.<n>The R2D2-Net can effectively shrink irrelevant coefficients towards zero, while preventing key features from over-shrinkage.
arXiv Detail & Related papers (2025-05-23T18:15:44Z)
Post-Hoc Uncertainty Quantification in Pre-Trained Neural Networks via Activation-Level Gaussian Processes [0.15705429611931052]
We introduce the Gaussian Process Activation function (GAPA) to capture neuron-level uncertainties. Our approach operates in a post-hoc manner, preserving the original mean predictions of the pre-trained neural network.
arXiv Detail & Related papers (2025-02-28T11:29:06Z)
Fixed-Mean Gaussian Processes for Post-hoc Bayesian Deep Learning [11.22428369342346]
We introduce a novel family of sparse variational Gaussian processes (GPs), where the posterior mean is fixed to any continuous function when using a universal kernel. Specifically, we fix the mean of this GP to the output of the pre-trained DNN, allowing our approach to effectively fit the GP's predictive variances to estimate the prediction uncertainty. Experimental results demonstrate that FMGP improves both uncertainty estimation and computational efficiency when compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-12-05T14:17:16Z)
Empowering Bayesian Neural Networks with Functional Priors through Anchored Ensembling for Mechanics Surrogate Modeling Applications [0.0]
We present a novel BNN training scheme based on anchored ensembling that can integrate a priori information available in the function space. The anchoring scheme makes use of low-rank correlations between NN parameters, learnt from pre-training to realizations of the functional prior. We also perform a study to demonstrate how correlations between NN weights, which are often neglected in existing BNN implementations, is critical to appropriately transfer knowledge between the function-space and parameter-space priors.
arXiv Detail & Related papers (2024-09-08T22:27:50Z)
Bayesian Neural Networks with Domain Knowledge Priors [52.80929437592308]
We propose a framework for integrating general forms of domain knowledge into a BNN prior. We show that BNNs using our proposed domain knowledge priors outperform those with standard priors.
arXiv Detail & Related papers (2024-02-20T22:34:53Z)
Tractable Function-Space Variational Inference in Bayesian Neural Networks [72.97620734290139]
A popular approach for estimating the predictive uncertainty of neural networks is to define a prior distribution over the network parameters. We propose a scalable function-space variational inference method that allows incorporating prior information. We show that the proposed method leads to state-of-the-art uncertainty estimation and predictive performance on a range of prediction tasks.
arXiv Detail & Related papers (2023-12-28T18:33:26Z)
Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model. By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation. It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z)
Sparsifying Bayesian neural networks with latent binary variables and normalizing flows [10.865434331546126]
We will consider two extensions to the latent binary Bayesian neural networks (LBBNN) method. Firstly, by using the local reparametrization trick (LRT) to sample the hidden units directly, we get a more computationally efficient algorithm. More importantly, by using normalizing flows on the variational posterior distribution of the LBBNN parameters, the network learns a more flexible variational posterior distribution than the mean field Gaussian.
arXiv Detail & Related papers (2023-05-05T09:40:28Z)
BNNpriors: A library for Bayesian neural network inference with different prior distributions [32.944046414823916]
BNNpriors enables state-of-the-art Markov Chain Monte Carlo inference on Bayesian neural networks. It follows a modular approach that eases the design and implementation of new custom priors. It has facilitated foundational discoveries on the nature of the cold posterior effect in Bayesian neural networks.
arXiv Detail & Related papers (2021-05-14T17:11:04Z)
Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit [47.324627920761685]
We use recent theoretical advances that characterize the function-space prior to an ensemble of infinitely-wide NNs as a Gaussian process. This gives us a better understanding of the implicit prior NNs place on function space. We also examine the calibration of previous approaches to classification with the NNGP.
arXiv Detail & Related papers (2020-10-14T18:41:54Z)
Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN) Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one. We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.