Deep Ensembles Secretly Perform Empirical Bayes
- URL: http://arxiv.org/abs/2501.17917v1
- Date: Wed, 29 Jan 2025 19:00:01 GMT
- Title: Deep Ensembles Secretly Perform Empirical Bayes
- Authors: Gabriel Loaiza-Ganem, Valentin Villecroze, Yixin Wang,
- Abstract summary: We show that deep ensembles perform exact Bayesian averaging with a posterior obtained with an implicitly learned data-dependent prior.<n>This perspective offers two main benefits: (i) it theoretically justifies deep ensembles and thus provides an explanation for their strong empirical performance; and (ii) inspection of the learned prior reveals it is given by a mixture of point masses.
- Score: 24.94944362766761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantifying uncertainty in neural networks is a highly relevant problem which is essential to many applications. The two predominant paradigms to tackle this task are Bayesian neural networks (BNNs) and deep ensembles. Despite some similarities between these two approaches, they are typically surmised to lack a formal connection and are thus understood as fundamentally different. BNNs are often touted as more principled due to their reliance on the Bayesian paradigm, whereas ensembles are perceived as more ad-hoc; yet, deep ensembles tend to empirically outperform BNNs, with no satisfying explanation as to why this is the case. In this work we bridge this gap by showing that deep ensembles perform exact Bayesian averaging with a posterior obtained with an implicitly learned data-dependent prior. In other words deep ensembles are Bayesian, or more specifically, they implement an empirical Bayes procedure wherein the prior is learned from the data. This perspective offers two main benefits: (i) it theoretically justifies deep ensembles and thus provides an explanation for their strong empirical performance; and (ii) inspection of the learned prior reveals it is given by a mixture of point masses -- the use of such a strong prior helps elucidate observed phenomena about ensembles. Overall, our work delivers a newfound understanding of deep ensembles which is not only of interest in it of itself, but which is also likely to generate future insights that drive empirical improvements for these models.
Related papers
- Last Layer Empirical Bayes [24.94944362766761]
Bayesian neural networks (BNNs) and deep ensembles are among the most prominent approaches to tackle this task.<n>Inspired by recent work showing that the distribution used by ensembles can be understood as a posterior corresponding to a learned data-dependent prior, we propose last layer empirical Bayes (LLEB)<n>LLEB performs on par with existing approaches, highlighting that empirical Bayes is a promising direction for future research in uncertainty quantification.
arXiv Detail & Related papers (2025-05-21T18:00:00Z) - Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - Towards Understanding Dual BN In Hybrid Adversarial Training [79.92394747290905]
We show that disentangling statistics plays a less role than disentangling affine parameters in model training.
We propose a two-task hypothesis which serves as the empirical foundation and a unified framework for Hybrid-AT improvement.
arXiv Detail & Related papers (2024-03-28T05:08:25Z) - Learning Discretized Bayesian Networks with GOMEA [0.0]
We extend an existing state-of-the-art structure learning approach to jointly learn variable discretizations.
We show how this enables incorporating expert knowledge in a uniquely insightful fashion, finding multiple DBNs that trade-off complexity, accuracy, and the difference with a pre-determined expert network.
arXiv Detail & Related papers (2024-02-19T14:29:35Z) - Implicit Visual Bias Mitigation by Posterior Estimate Sharpening of a
Bayesian Neural Network [7.488317734152586]
We propose a novel implicit mitigation method using a Bayesian neural network.
Our proposed posterior estimate sharpening procedure encourages the network to focus on core features that do not contribute to high uncertainties.
arXiv Detail & Related papers (2023-03-29T09:47:35Z) - Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization [5.678271181959529]
We study the evolution of the angle between two inputs to a ReLU neural network as a function of the number of layers.
We validate our theoretical results with Monte Carlo experiments and show that our results accurately approximate finite network behaviour.
We also empirically investigate how the depth degeneracy phenomenon can negatively impact training of real networks.
arXiv Detail & Related papers (2023-02-20T01:30:27Z) - Pathologies of Predictive Diversity in Deep Ensembles [29.893614175153235]
Classic results establish that encouraging predictive diversity improves performance in ensembles of low-capacity models.
Here we demonstrate that these intuitions do not apply to high-capacity neural network ensembles (deep ensembles)
arXiv Detail & Related papers (2023-02-01T19:01:18Z) - Joint Training of Deep Ensembles Fails Due to Learner Collusion [61.557412796012535]
Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model.
Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance.
We show that directly minimizing the loss of the ensemble appears to rarely be applied in practice.
arXiv Detail & Related papers (2023-01-26T18:58:07Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Deep Ensembles Work, But Are They Necessary? [19.615082441403946]
Ensembling neural networks is an effective way to increase accuracy.
Recent work suggests that deep ensembles may offer benefits beyond predictive power.
We show that a single (but larger) neural network can replicate these qualities.
arXiv Detail & Related papers (2022-02-14T19:01:01Z) - Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in
Deep Learning [24.3370326359959]
We propose to predict with a Gaussian mixture model posterior that consists of a weighted sum of Laplace approximations of independently trained deep neural networks.
We theoretically validate that our approach mitigates overconfidence "far away" from the training data and empirically compare against state-of-the-art baselines on standard uncertainty quantification benchmarks.
arXiv Detail & Related papers (2021-11-05T15:52:48Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.