Diversity Matters When Learning From Ensembles
- URL: http://arxiv.org/abs/2110.14149v1
- Date: Wed, 27 Oct 2021 03:44:34 GMT
- Title: Diversity Matters When Learning From Ensembles
- Authors: Giung Nam, Jongmin Yoon, Yoonho Lee, Juho Lee
- Abstract summary: Deep ensembles excel in large-scale image classification tasks both in terms of prediction accuracy and calibration.
Despite being simple to train, the computation and memory cost of deep ensembles limits their practicability.
We propose a simple approach for reducing this gap, i.e., making the distilled performance close to the full ensemble.
- Score: 20.05842308307947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep ensembles excel in large-scale image classification tasks both in terms
of prediction accuracy and calibration. Despite being simple to train, the
computation and memory cost of deep ensembles limits their practicability.
While some recent works propose to distill an ensemble model into a single
model to reduce such costs, there is still a performance gap between the
ensemble and distilled models. We propose a simple approach for reducing this
gap, i.e., making the distilled performance close to the full ensemble. Our key
assumption is that a distilled model should absorb as much function diversity
inside the ensemble as possible. We first empirically show that the typical
distillation procedure does not effectively transfer such diversity, especially
for complex models that achieve near-zero training error. To fix this, we
propose a perturbation strategy for distillation that reveals diversity by
seeking inputs for which ensemble member outputs disagree. We empirically show
that a model distilled with such perturbed samples indeed exhibits enhanced
diversity, leading to improved performance.
Related papers
- Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-Experts [52.39959535724677]
We introduce an alternative solution to improve the generalization of image restoration models.
We propose AdaptIR, a Mixture-of-Experts (MoE) with multi-branch design to capture local, global, and channel representation bases.
Our AdaptIR achieves stable performance on single-degradation tasks, and excels in hybrid-degradation tasks, with fine-tuning only 0.6% parameters for 8 hours.
arXiv Detail & Related papers (2023-12-12T14:27:59Z) - One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image.
Our method enables fully offline training with just noise/image pairs from the diffusion model.
We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z) - HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
Transformers [49.79405257763856]
This paper focuses on task-agnostic distillation.
It produces a compact pre-trained model that can be easily fine-tuned on various tasks with small computational costs and memory footprints.
We propose Homotopic Distillation (HomoDistil), a novel task-agnostic distillation approach equipped with iterative pruning.
arXiv Detail & Related papers (2023-02-19T17:37:24Z) - Functional Ensemble Distillation [18.34081591772928]
We investigate how to best distill an ensemble's predictions using an efficient model.
We find that learning the distilled model via a simple augmentation scheme in the form of mixup augmentation significantly boosts the performance.
arXiv Detail & Related papers (2022-06-05T14:07:17Z) - Structured Pruning Learns Compact and Accurate Models [28.54826400747667]
We propose a task-specific structured pruning method CoFi (Coarse- and Fine-grained Pruning)
CoFi delivers highly parallelizableworks and matches the distillation methods in both accuracy and latency.
Our experiments on GLUE and SQuAD datasets show that CoFi yields models with over 10x speedups with a small accuracy drop.
arXiv Detail & Related papers (2022-04-01T13:09:56Z) - A Closer Look at Codistillation for Distributed Training [21.08740153686464]
We investigate codistillation in a distributed training setup.
We find that even at moderate batch sizes, models trained with codistillation can perform as well as models trained with synchronous data-parallel methods.
arXiv Detail & Related papers (2020-10-06T16:01:34Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - A general framework for ensemble distribution distillation [14.996944635904402]
Ensembles of neural networks have been shown to give better performance than single networks in terms of predictions and uncertainty estimation.
We present a framework for distilling both regression and classification ensembles in a way that preserves the decomposition.
arXiv Detail & Related papers (2020-02-26T14:34:43Z) - Hydra: Preserving Ensemble Diversity for Model Distillation [46.677567663908185]
Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty.
Recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble.
We propose a distillation method based on a single multi-headed neural network, which we refer to as Hydra.
arXiv Detail & Related papers (2020-01-14T10:13:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.