Favour: FAst Variance Operator for Uncertainty Rating
- URL: http://arxiv.org/abs/2311.13036v1
- Date: Tue, 21 Nov 2023 22:53:20 GMT
- Title: Favour: FAst Variance Operator for Uncertainty Rating
- Authors: Thomas D. Ahle, Sahar Karimi, Peter Tak Peter Tang
- Abstract summary: Bayesian Neural Networks (BNN) have emerged as a crucial approach for interpreting ML predictions.
By sampling from the posterior distribution, data scientists may estimate the uncertainty of an inference.
Previous work proposed propagating the first and second moments of the posterior directly through the network.
This method is even slower than sampling, so the propagated variance needs to be approximated.
Our contribution is a more principled variance propagation framework.
- Score: 0.034530027457862
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bayesian Neural Networks (BNN) have emerged as a crucial approach for
interpreting ML predictions. By sampling from the posterior distribution, data
scientists may estimate the uncertainty of an inference. Unfortunately many
inference samples are often needed, the overhead of which greatly hinder BNN's
wide adoption. To mitigate this, previous work proposed propagating the first
and second moments of the posterior directly through the network. However, on
its own this method is even slower than sampling, so the propagated variance
needs to be approximated such as assuming independence between neural nodes.
The resulting trade-off between quality and inference time did not match even
plain Monte Carlo sampling.
Our contribution is a more principled variance propagation framework based on
"spiked covariance matrices", which smoothly interpolates between quality and
inference time. This is made possible by a new fast algorithm for updating a
diagonal-plus-low-rank matrix approximation under various operations. We tested
our algorithm against sampling based MC Dropout and Variational Inference on a
number of downstream uncertainty themed tasks, such as calibration and
out-of-distribution testing. We find that Favour is as fast as performing 2-3
inference samples, while matching the performance of 10-100 samples.
In summary, this work enables the use of BNN in the realm of performance
critical tasks where they have previously been out of reach.
Related papers
- Collapsed Inference for Bayesian Deep Learning [36.1725075097107]
We introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples.
A collapsed sample represents uncountably many models drawn from the approximate posterior.
Our proposed use of collapsed samples achieves a balance between scalability and accuracy.
arXiv Detail & Related papers (2023-06-16T08:34:42Z) - Variational Inference on the Final-Layer Output of Neural Networks [3.146069168382982]
This paper proposes to combine the advantages of both approaches by performing Variational Inference in the Final layer Output space (VIFO)
We use neural networks to learn the mean and the variance of the probabilistic output.
Experiments show that VIFO provides a good tradeoff in terms of run time and uncertainty quantification, especially for out of distribution data.
arXiv Detail & Related papers (2023-02-05T16:19:01Z) - Distribution estimation and change-point detection for time series via
DNN-based GANs [0.0]
generative adversarial networks (GANs) have recently been applied to estimating the distribution of independent and identically distributed data.
In this paper, we use the blocking technique to demonstrate the effectiveness of GANs for estimating the distribution of stationary time series.
arXiv Detail & Related papers (2022-11-26T14:33:34Z) - ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference [54.17205151960878]
We introduce a sampling-free approach that is generic and easy to deploy.
We produce reliable uncertainty estimates on par with state-of-the-art methods at a significantly lower computational cost.
arXiv Detail & Related papers (2022-11-21T13:23:09Z) - GFlowOut: Dropout with Generative Flow Networks [76.59535235717631]
Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference.
Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference.
GFlowOutleverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks.
arXiv Detail & Related papers (2022-10-24T03:00:01Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.