Hydra: Preserving Ensemble Diversity for Model Distillation
- URL: http://arxiv.org/abs/2001.04694v2
- Date: Fri, 19 Mar 2021 11:25:46 GMT
- Title: Hydra: Preserving Ensemble Diversity for Model Distillation
- Authors: Linh Tran, Bastiaan S. Veeling, Kevin Roth, Jakub Swiatkowski, Joshua
V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Sebastian Nowozin,
Rodolphe Jenatton
- Abstract summary: Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty.
Recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble.
We propose a distillation method based on a single multi-headed neural network, which we refer to as Hydra.
- Score: 46.677567663908185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensembles of models have been empirically shown to improve predictive
performance and to yield robust measures of uncertainty. However, they are
expensive in computation and memory. Therefore, recent research has focused on
distilling ensembles into a single compact model, reducing the computational
and memory burden of the ensemble while trying to preserve its predictive
behavior. Most existing distillation formulations summarize the ensemble by
capturing its average predictions. As a result, the diversity of the ensemble
predictions, stemming from each member, is lost. Thus, the distilled model
cannot provide a measure of uncertainty comparable to that of the original
ensemble. To retain more faithfully the diversity of the ensemble, we propose a
distillation method based on a single multi-headed neural network, which we
refer to as Hydra. The shared body network learns a joint feature
representation that enables each head to capture the predictive behavior of
each ensemble member. We demonstrate that with a slight increase in parameter
count, Hydra improves distillation performance on classification and regression
settings while capturing the uncertainty behavior of the original ensemble over
both in-domain and out-of-distribution tasks.
Related papers
- Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Functional Ensemble Distillation [18.34081591772928]
We investigate how to best distill an ensemble's predictions using an efficient model.
We find that learning the distilled model via a simple augmentation scheme in the form of mixup augmentation significantly boosts the performance.
arXiv Detail & Related papers (2022-06-05T14:07:17Z) - Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma
Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result.
Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z) - Diversity Matters When Learning From Ensembles [20.05842308307947]
Deep ensembles excel in large-scale image classification tasks both in terms of prediction accuracy and calibration.
Despite being simple to train, the computation and memory cost of deep ensembles limits their practicability.
We propose a simple approach for reducing this gap, i.e., making the distilled performance close to the full ensemble.
arXiv Detail & Related papers (2021-10-27T03:44:34Z) - Repulsive Deep Ensembles are Bayesian [6.544954579068863]
We introduce a kernelized repulsive term in the update rule of the deep ensembles.
We show that this simple modification not only enforces and maintains diversity among the members but, even more importantly, transforms the maximum a posteriori inference into proper Bayesian inference.
arXiv Detail & Related papers (2021-06-22T09:50:28Z) - DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial
Estimation [109.11580756757611]
Deep ensembles perform better than a single network thanks to the diversity among their members.
Recent approaches regularize predictions to increase diversity; however, they also drastically decrease individual members' performances.
We introduce a novel training criterion called DICE: it increases diversity by reducing spurious correlations among features.
arXiv Detail & Related papers (2021-01-14T10:53:26Z) - A Closer Look at Codistillation for Distributed Training [21.08740153686464]
We investigate codistillation in a distributed training setup.
We find that even at moderate batch sizes, models trained with codistillation can perform as well as models trained with synchronous data-parallel methods.
arXiv Detail & Related papers (2020-10-06T16:01:34Z) - A general framework for ensemble distribution distillation [14.996944635904402]
Ensembles of neural networks have been shown to give better performance than single networks in terms of predictions and uncertainty estimation.
We present a framework for distilling both regression and classification ensembles in a way that preserves the decomposition.
arXiv Detail & Related papers (2020-02-26T14:34:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.