Repulsive Deep Ensembles are Bayesian
- URL: http://arxiv.org/abs/2106.11642v3
- Date: Tue, 28 Mar 2023 12:52:39 GMT
- Title: Repulsive Deep Ensembles are Bayesian
- Authors: Francesco D'Angelo, Vincent Fortuin
- Abstract summary: We introduce a kernelized repulsive term in the update rule of the deep ensembles.
We show that this simple modification not only enforces and maintains diversity among the members but, even more importantly, transforms the maximum a posteriori inference into proper Bayesian inference.
- Score: 6.544954579068863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep ensembles have recently gained popularity in the deep learning community
for their conceptual simplicity and efficiency. However, maintaining functional
diversity between ensemble members that are independently trained with gradient
descent is challenging. This can lead to pathologies when adding more ensemble
members, such as a saturation of the ensemble performance, which converges to
the performance of a single model. Moreover, this does not only affect the
quality of its predictions, but even more so the uncertainty estimates of the
ensemble, and thus its performance on out-of-distribution data. We hypothesize
that this limitation can be overcome by discouraging different ensemble members
from collapsing to the same function. To this end, we introduce a kernelized
repulsive term in the update rule of the deep ensembles. We show that this
simple modification not only enforces and maintains diversity among the members
but, even more importantly, transforms the maximum a posteriori inference into
proper Bayesian inference. Namely, we show that the training dynamics of our
proposed repulsive ensembles follow a Wasserstein gradient flow of the KL
divergence with the true posterior. We study repulsive terms in weight and
function space and empirically compare their performance to standard ensembles
and Bayesian baselines on synthetic and real-world prediction tasks.
Related papers
- Bayesian vs. PAC-Bayesian Deep Neural Network Ensembles [7.883369697332076]
We argue that neither the sampling nor the weighting in a Bayes ensemble are particularly well-suited for increasing generalization performance.
We show that state-of-the-art Bayes ensembles from the literature, despite being computationally demanding, do not improve over simple uniformly weighted deep ensembles.
arXiv Detail & Related papers (2024-06-08T13:19:18Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Deep Anti-Regularized Ensembles provide reliable out-of-distribution
uncertainty quantification [4.750521042508541]
Deep ensemble often return overconfident estimates outside the training domain.
We show that an ensemble of networks with large weights fitting the training data are likely to meet these two objectives.
We derive a theoretical framework for this approach and show that the proposed optimization can be seen as a "water-filling" problem.
arXiv Detail & Related papers (2023-04-08T15:25:12Z) - Joint Training of Deep Ensembles Fails Due to Learner Collusion [61.557412796012535]
Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model.
Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance.
We show that directly minimizing the loss of the ensemble appears to rarely be applied in practice.
arXiv Detail & Related papers (2023-01-26T18:58:07Z) - Partial Order in Chaos: Consensus on Feature Attributions in the
Rashomon Set [50.67431815647126]
Post-hoc global/local feature attribution methods are being progressively employed to understand machine learning models.
We show that partial orders of local/global feature importance arise from this methodology.
We show that every relation among features present in these partial orders also holds in the rankings provided by existing approaches.
arXiv Detail & Related papers (2021-10-26T02:53:14Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Deconfounding Scores: Feature Representations for Causal Effect
Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation.
We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data.
In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z) - Hydra: Preserving Ensemble Diversity for Model Distillation [46.677567663908185]
Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty.
Recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble.
We propose a distillation method based on a single multi-headed neural network, which we refer to as Hydra.
arXiv Detail & Related papers (2020-01-14T10:13:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.