A general framework for ensemble distribution distillation
- URL: http://arxiv.org/abs/2002.11531v2
- Date: Fri, 8 Jan 2021 11:20:35 GMT
- Title: A general framework for ensemble distribution distillation
- Authors: Jakob Lindqvist, Amanda Olmin, Fredrik Lindsten, Lennart Svensson
- Abstract summary: Ensembles of neural networks have been shown to give better performance than single networks in terms of predictions and uncertainty estimation.
We present a framework for distilling both regression and classification ensembles in a way that preserves the decomposition.
- Score: 14.996944635904402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensembles of neural networks have been shown to give better performance than
single networks, both in terms of predictions and uncertainty estimation.
Additionally, ensembles allow the uncertainty to be decomposed into aleatoric
(data) and epistemic (model) components, giving a more complete picture of the
predictive uncertainty. Ensemble distillation is the process of compressing an
ensemble into a single model, often resulting in a leaner model that still
outperforms the individual ensemble members. Unfortunately, standard
distillation erases the natural uncertainty decomposition of the ensemble. We
present a general framework for distilling both regression and classification
ensembles in a way that preserves the decomposition. We demonstrate the desired
behaviour of our framework and show that its predictive performance is on par
with standard distillation.
Related papers
- Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - Normalizing Flow Ensembles for Rich Aleatoric and Epistemic Uncertainty
Modeling [21.098866735156207]
We propose an ensemble of Normalizing Flows (NF) which are state-of-the-art in modeling aleatoric uncertainty.
The ensembles are created via sets of fixed dropout masks, making them less expensive than creating separate NF models.
We demonstrate how to leverage the unique structure of NFs, base distributions, to estimate aleatoric uncertainty without relying on samples.
arXiv Detail & Related papers (2023-02-02T18:38:35Z) - RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and
Out Distribution Robustness [94.69774317059122]
We show that the effectiveness of the well celebrated Mixup can be further improved if instead of using it as the sole learning objective, it is utilized as an additional regularizer to the standard cross-entropy loss.
This simple change not only provides much improved accuracy but also significantly improves the quality of the predictive uncertainty estimation of Mixup.
arXiv Detail & Related papers (2022-06-29T09:44:33Z) - Deep interpretable ensembles [0.0]
In deep ensembling, the individual models are usually black box neural networks, or recently, partially interpretable semi-structured deep transformation models.
We propose a novel transformation ensemble which aggregates probabilistic predictions with the guarantee to preserve interpretability and yield uniformly better predictions than the ensemble members on average.
arXiv Detail & Related papers (2022-05-25T12:39:39Z) - Diversity Matters When Learning From Ensembles [20.05842308307947]
Deep ensembles excel in large-scale image classification tasks both in terms of prediction accuracy and calibration.
Despite being simple to train, the computation and memory cost of deep ensembles limits their practicability.
We propose a simple approach for reducing this gap, i.e., making the distilled performance close to the full ensemble.
arXiv Detail & Related papers (2021-10-27T03:44:34Z) - Partial Order in Chaos: Consensus on Feature Attributions in the
Rashomon Set [50.67431815647126]
Post-hoc global/local feature attribution methods are being progressively employed to understand machine learning models.
We show that partial orders of local/global feature importance arise from this methodology.
We show that every relation among features present in these partial orders also holds in the rankings provided by existing approaches.
arXiv Detail & Related papers (2021-10-26T02:53:14Z) - Ensemble Distillation for Structured Prediction: Calibrated, Accurate,
Fast-Choose Three [7.169968368139168]
We study ensemble distillation as a framework for producing well-calibrated structured prediction models.
We validate this framework on two tasks: named-entity recognition and machine translation.
We find that, across both tasks, ensemble distillation produces models which retain much of, and occasionally improve upon, the performance and calibration benefits of ensembles.
arXiv Detail & Related papers (2020-10-13T22:30:06Z) - Consistent Estimation of Identifiable Nonparametric Mixture Models from
Grouped Observations [84.81435917024983]
This work proposes an algorithm that consistently estimates any identifiable mixture model from grouped observations.
A practical implementation is provided for paired observations, and the approach is shown to outperform existing methods.
arXiv Detail & Related papers (2020-06-12T20:44:22Z) - Hydra: Preserving Ensemble Diversity for Model Distillation [46.677567663908185]
Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty.
Recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble.
We propose a distillation method based on a single multi-headed neural network, which we refer to as Hydra.
arXiv Detail & Related papers (2020-01-14T10:13:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.