Related papers: Fast Predictive Uncertainty for Classification with Bayesian Deep Networks

Fast Predictive Uncertainty for Classification with Bayesian Deep Networks

URL: http://arxiv.org/abs/2003.01227v4
Date: Tue, 31 May 2022 06:50:20 GMT
Title: Fast Predictive Uncertainty for Classification with Bayesian Deep Networks
Authors: Marius Hobbhahn, Agustinus Kristiadi, Philipp Hennig
Abstract summary: In Bayesian Deep Learning, distributions over the output of classification neural networks are approximated by first constructing a Gaussian distribution over the weights, then sampling from it to receive a distribution over the softmax outputs. We construct a Dirichlet approximation of this softmax output distribution, which yields an analytic map between Gaussian distributions in logit space and Dirichlet distributions in the output space. We demonstrate that the resulting Dirichlet distribution has multiple advantages, in particular, more efficient of the uncertainty estimate and scaling to large datasets and networks like ImageNet and DenseNet.
Score: 25.821401066200504
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In Bayesian Deep Learning, distributions over the output of classification neural networks are often approximated by first constructing a Gaussian distribution over the weights, then sampling from it to receive a distribution over the softmax outputs. This is costly. We reconsider old work (Laplace Bridge) to construct a Dirichlet approximation of this softmax output distribution, which yields an analytic map between Gaussian distributions in logit space and Dirichlet distributions (the conjugate prior to the Categorical distribution) in the output space. Importantly, the vanilla Laplace Bridge comes with certain limitations. We analyze those and suggest a simple solution that compares favorably to other commonly used estimates of the softmax-Gaussian integral. We demonstrate that the resulting Dirichlet distribution has multiple advantages, in particular, more efficient computation of the uncertainty estimate and scaling to large datasets and networks like ImageNet and DenseNet. We further demonstrate the usefulness of this Dirichlet approximation by using it to construct a lightweight uncertainty-aware output ranking for ImageNet.

Related papers

Simulating Posterior Bayesian Neural Networks with Dependent Weights [0.0]
We consider posterior Bayesian fully connected and feedforward deep neural networks with dependent weights.<n>We identify the distribution of the wide width limit and provide an algorithm to sample from the network.<n>All the theoretical results are numerically validated.
arXiv Detail & Related papers (2025-07-29T15:54:34Z)
Rethinking Approximate Gaussian Inference in Classification [25.021782278452005]
In classification tasks, softmax functions are ubiquitously used to produce predictive probabilities. We propose a simple change in the learning objective which allows the exact computation of predictives. We evaluate our approach combined with several approximate Gaussian inference methods on large- and small-scale datasets.
arXiv Detail & Related papers (2025-02-05T17:03:49Z)
Hessian-Free Laplace in Bayesian Deep Learning [44.16006844888796]
Hessian-free Laplace (HFL) approximation uses curvature of both the log posterior and network prediction to estimate its variance. We show that, under standard assumptions of LA in Bayesian deep learning, HFL targets the same variance as LA, and can be efficiently amortized in a pre-trained network.
arXiv Detail & Related papers (2024-03-15T20:47:39Z)
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z)
On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates [5.13323375365494]
We provide theoretical guarantees for the convergence behaviour of diffusion-based generative models under strongly log-concave data. Our class of functions used for score estimation is made of Lipschitz continuous functions avoiding any Lipschitzness assumption on the score function. This approach yields the best known convergence rate for our sampling algorithm.
arXiv Detail & Related papers (2023-11-22T18:40:45Z)
Generalized Schrödinger Bridge Matching [54.171931505066]
Generalized Schr"odinger Bridge (GSB) problem setup is prevalent in many scientific areas both within and without machine learning. We propose Generalized Schr"odinger Bridge Matching (GSBM), a new matching algorithm inspired by recent advances. We show that such a generalization can be cast as solving conditional optimal control, for which variational approximations can be used.
arXiv Detail & Related papers (2023-10-03T17:42:11Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
Robust Estimation for Nonparametric Families via Generative Adversarial Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems. Our work extend these to robust mean estimation, second moment estimation, and robust linear regression. In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z)
On the capacity of deep generative networks for approximating distributions [8.798333793391544]
We prove that neural networks can transform a one-dimensional source distribution to a distribution arbitrarily close to a high-dimensional target distribution in Wasserstein distances. It is shown that the approximation error grows at most linearly on the ambient dimension. $f$-divergences are less adequate than Waserstein distances as metrics of distributions for generating samples.
arXiv Detail & Related papers (2021-01-29T01:45:02Z)
Bayesian Deep Learning via Subnetwork Inference [2.2835610890984164]
We show that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors. This subnetwork inference framework enables us to use expressive, otherwise intractable, posterior approximations over such subsets.
arXiv Detail & Related papers (2020-10-28T01:10:11Z)
Mean-Field Approximation to Gaussian-Softmax Integral with Application to Uncertainty Estimation [23.38076756988258]
We propose a new single-model based approach to quantify uncertainty in deep neural networks. We use a mean-field approximation formula to compute an analytically intractable integral. Empirically, the proposed approach performs competitively when compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-06-13T07:32:38Z)
Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization. We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples. A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set. We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.