Related papers: The Case for Bayesian Deep Learning

The Case for Bayesian Deep Learning

URL: http://arxiv.org/abs/2001.10995v1
Date: Wed, 29 Jan 2020 18:08:52 GMT
Title: The Case for Bayesian Deep Learning
Authors: Andrew Gordon Wilson
Abstract summary: Key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule. Recent advances for Bayesian deep learning provide improvements in accuracy and calibration compared to standard training.
Score: 41.54360061376725
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule. Bayesian inference is especially compelling for deep neural networks. (1) Neural networks are typically underspecified by the data, and can represent many different but high performing models corresponding to different settings of parameters, which is exactly when marginalization will make the biggest difference for both calibration and accuracy. (2) Deep ensembles have been mistaken as competing approaches to Bayesian methods, but can be seen as approximate Bayesian marginalization. (3) The structure of neural networks gives rise to a structured prior in function space, which reflects the inductive biases of neural networks that help them generalize. (4) The observed correlation between parameters in flat regions of the loss and a diversity of solutions that provide good generalization is further conducive to Bayesian marginalization, as flat regions occupy a large volume in a high dimensional space, and each different solution will make a good contribution to a Bayesian model average. (5) Recent practical advances for Bayesian deep learning provide improvements in accuracy and calibration compared to standard training, while retaining scalability.

Related papers

Variational Bayesian Bow tie Neural Networks with Shrinkage [0.276240219662896]
We build a relaxed version of the standard feed-forward rectified neural network. We employ Polya-Gamma data augmentation tricks to render a conditionally linear and Gaussian model. We derive a variational inference algorithm that avoids distributional assumptions and independence across layers.
arXiv Detail & Related papers (2024-11-17T17:36:30Z)
Posterior and variational inference for deep neural networks with heavy-tailed weights [0.0]
We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random. We show that the corresponding posterior distribution achieves near-optimal minimax contraction rates. We also provide variational Bayes counterparts of the results, that show that mean-field variational approximations still benefit from near-optimal theoretical support.
arXiv Detail & Related papers (2024-06-05T15:24:20Z)
Towards Improved Variational Inference for Deep Bayesian Models [7.841254447222393]
In this thesis, we explore the use of variational inference (VI) as an approximation. VI is unique in simultaneously approximating the posterior and providing a lower bound to the marginal likelihood. We propose a variational posterior that provides a unified view of inference in Bayesian neural networks and deep Gaussian processes.
arXiv Detail & Related papers (2024-01-23T00:40:20Z)
Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory. We show that linear networks make provably optimal predictions at infinite depth. We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z)
Deep Discriminative to Kernel Density Graph for In- and Out-of-distribution Calibrated Inference [7.840433908659846]
Deep discriminative approaches like random forests and deep neural networks have recently found applications in many important real-world scenarios. However, deploying these learning algorithms in safety-critical applications raises concerns, particularly when it comes to ensuring confidence calibration for both in-distribution and out-of-distribution data points. In this paper, we address ID and OOD calibration problems jointly.
arXiv Detail & Related papers (2022-01-31T05:07:16Z)
Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations. This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks [65.24701908364383]
We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated" We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
arXiv Detail & Related papers (2020-02-24T08:52:06Z)
Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization. We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.