The Case for Bayesian Deep Learning
- URL: http://arxiv.org/abs/2001.10995v1
- Date: Wed, 29 Jan 2020 18:08:52 GMT
- Title: The Case for Bayesian Deep Learning
- Authors: Andrew Gordon Wilson
- Abstract summary: Key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule.
Recent advances for Bayesian deep learning provide improvements in accuracy and calibration compared to standard training.
- Score: 41.54360061376725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The key distinguishing property of a Bayesian approach is marginalization
instead of optimization, not the prior, or Bayes rule. Bayesian inference is
especially compelling for deep neural networks. (1) Neural networks are
typically underspecified by the data, and can represent many different but high
performing models corresponding to different settings of parameters, which is
exactly when marginalization will make the biggest difference for both
calibration and accuracy. (2) Deep ensembles have been mistaken as competing
approaches to Bayesian methods, but can be seen as approximate Bayesian
marginalization. (3) The structure of neural networks gives rise to a
structured prior in function space, which reflects the inductive biases of
neural networks that help them generalize. (4) The observed correlation between
parameters in flat regions of the loss and a diversity of solutions that
provide good generalization is further conducive to Bayesian marginalization,
as flat regions occupy a large volume in a high dimensional space, and each
different solution will make a good contribution to a Bayesian model average.
(5) Recent practical advances for Bayesian deep learning provide improvements
in accuracy and calibration compared to standard training, while retaining
scalability.
Related papers
- A Bayesian Approach Toward Robust Multidimensional Ellipsoid-Specific Fitting [0.0]
This work presents a novel and effective method for fitting multidimensional ellipsoids to scattered data in the contamination of noise and outliers.
We incorporate a uniform prior distribution to constrain the search for primitive parameters within an ellipsoidal domain.
We apply it to a wide range of practical applications such as microscopy cell counting, 3D reconstruction, geometric shape approximation, and magnetometer calibration tasks.
arXiv Detail & Related papers (2024-07-27T14:31:51Z) - Posterior and variational inference for deep neural networks with heavy-tailed weights [0.0]
We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random.
We show that the corresponding posterior distribution achieves near-optimal minimax contraction rates.
We also provide variational Bayes counterparts of the results, that show that mean-field variational approximations still benefit from near-optimal theoretical support.
arXiv Detail & Related papers (2024-06-05T15:24:20Z) - Towards Improved Variational Inference for Deep Bayesian Models [7.841254447222393]
In this thesis, we explore the use of variational inference (VI) as an approximation.
VI is unique in simultaneously approximating the posterior and providing a lower bound to the marginal likelihood.
We propose a variational posterior that provides a unified view of inference in Bayesian neural networks and deep Gaussian processes.
arXiv Detail & Related papers (2024-01-23T00:40:20Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Deep Discriminative to Kernel Density Graph for In- and Out-of-distribution Calibrated Inference [7.840433908659846]
Deep discriminative approaches like random forests and deep neural networks have recently found applications in many important real-world scenarios.
However, deploying these learning algorithms in safety-critical applications raises concerns, particularly when it comes to ensuring confidence calibration for both in-distribution and out-of-distribution data points.
In this paper, we address ID and OOD calibration problems jointly.
arXiv Detail & Related papers (2022-01-31T05:07:16Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z) - Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks [65.24701908364383]
We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated"
We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
arXiv Detail & Related papers (2020-02-24T08:52:06Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.