Mean-Field Approximation to Gaussian-Softmax Integral with Application
to Uncertainty Estimation
- URL: http://arxiv.org/abs/2006.07584v2
- Date: Sun, 9 May 2021 05:31:11 GMT
- Title: Mean-Field Approximation to Gaussian-Softmax Integral with Application
to Uncertainty Estimation
- Authors: Zhiyun Lu, Eugene Ie, Fei Sha
- Abstract summary: We propose a new single-model based approach to quantify uncertainty in deep neural networks.
We use a mean-field approximation formula to compute an analytically intractable integral.
Empirically, the proposed approach performs competitively when compared to state-of-the-art methods.
- Score: 23.38076756988258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many methods have been proposed to quantify the predictive uncertainty
associated with the outputs of deep neural networks. Among them, ensemble
methods often lead to state-of-the-art results, though they require
modifications to the training procedures and are computationally costly for
both training and inference. In this paper, we propose a new single-model based
approach. The main idea is inspired by the observation that we can "simulate"
an ensemble of models by drawing from a Gaussian distribution, with a form
similar to those from the asymptotic normality theory, infinitesimal Jackknife,
Laplacian approximation to Bayesian neural networks, and trajectories in
stochastic gradient descents. However, instead of using each model in the
"ensemble" to predict and then aggregating their predictions, we integrate the
Gaussian distribution and the softmax outputs of the neural networks. We use a
mean-field approximation formula to compute this analytically intractable
integral. The proposed approach has several appealing properties: it functions
as an ensemble without requiring multiple models, and it enables closed-form
approximate inference using only the first and second moments of the Gaussian.
Empirically, the proposed approach performs competitively when compared to
state-of-the-art methods, including deep ensembles, temperature scaling,
dropout and Bayesian NNs, on standard uncertainty estimation tasks. It also
outperforms many methods on out-of-distribution detection.
Related papers
- Likelihood approximations via Gaussian approximate inference [3.4991031406102238]
We propose efficient schemes to approximate the effects of non-Gaussian likelihoods by Gaussian densities.
Our results attain good approximation quality for binary and multiclass classification in large-scale point-estimate and distributional inferential settings.
As a by-product, we show that the proposed approximate log-likelihoods are a superior alternative to least-squares on raw labels for neural network classification.
arXiv Detail & Related papers (2024-10-28T05:39:26Z) - Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection [11.729744197698718]
We present an algorithmic framework to approximate a neural network of finite width and depth.
We iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes.
Our results can represent an important step towards understanding neural network predictions.
arXiv Detail & Related papers (2024-07-26T12:45:53Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Robust scalable initialization for Bayesian variational inference with
multi-modal Laplace approximations [0.0]
Variational mixtures with full-covariance structures suffer from a quadratic growth due to variational parameters with the number of parameters.
We propose a method for constructing an initial Gaussian model approximation that can be used to warm-start variational inference.
arXiv Detail & Related papers (2023-07-12T19:30:04Z) - Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset.
Many approximation methods have been developed, which inevitably introduce approximation error.
This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior.
We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations.
This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector.
We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z) - Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning.
We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly.
The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Distributed Stochastic Nonconvex Optimization and Learning based on
Successive Convex Approximation [26.11677569331688]
We introduce a novel framework for the distributed algorithmic minimization of the sum of the sum of the agents in a network.
We show that the proposed method can be applied to distributed neural networks.
arXiv Detail & Related papers (2020-04-30T15:36:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.