Self-Knowledge Distillation via Dropout
- URL: http://arxiv.org/abs/2208.05642v1
- Date: Thu, 11 Aug 2022 05:08:55 GMT
- Title: Self-Knowledge Distillation via Dropout
- Authors: Hyoje Lee, Yeachan Park, Hyun Seo, Myungjoo Kang
- Abstract summary: We propose a simple and effective self-knowledge distillation method using a dropout (SD-Dropout)
Our method does not require any additional trainable modules, does not rely on data, and requires only simple operations.
- Score: 0.7883397954991659
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To boost the performance, deep neural networks require deeper or wider
network structures that involve massive computational and memory costs. To
alleviate this issue, the self-knowledge distillation method regularizes the
model by distilling the internal knowledge of the model itself. Conventional
self-knowledge distillation methods require additional trainable parameters or
are dependent on the data. In this paper, we propose a simple and effective
self-knowledge distillation method using a dropout (SD-Dropout). SD-Dropout
distills the posterior distributions of multiple models through a dropout
sampling. Our method does not require any additional trainable modules, does
not rely on data, and requires only simple operations. Furthermore, this simple
method can be easily combined with various self-knowledge distillation
approaches. We provide a theoretical and experimental analysis of the effect of
forward and reverse KL-divergences in our work. Extensive experiments on
various vision tasks, i.e., image classification, object detection, and
distribution shift, demonstrate that the proposed method can effectively
improve the generalization of a single network. Further experiments show that
the proposed method also improves calibration performance, adversarial
robustness, and out-of-distribution detection ability.
Related papers
- Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution [81.81748032199813]
We propose a Distillation-Free One-Step Diffusion model.
Specifically, we propose a noise-aware discriminator (NAD) to participate in adversarial training.
We improve the perceptual loss with edge-aware DISTS (EA-DISTS) to enhance the model's ability to generate fine details.
arXiv Detail & Related papers (2024-10-05T16:41:36Z) - Small Scale Data-Free Knowledge Distillation [37.708282211941416]
We propose Small Scale Data-free Knowledge Distillation SSD-KD.
SSD-KD balances synthetic samples and a priority sampling function to select proper samples.
It can perform distillation training conditioned on an extremely small scale of synthetic samples.
arXiv Detail & Related papers (2024-06-12T05:09:41Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Explicit and Implicit Knowledge Distillation via Unlabeled Data [5.702176304876537]
We propose an efficient unlabeled sample selection method to replace high computational generators.
We also propose a class-dropping mechanism to suppress the label noise caused by the data domain shifts.
Experimental results show that our method can quickly converge and obtain higher accuracy than other state-of-the-art methods.
arXiv Detail & Related papers (2023-02-17T09:10:41Z) - Conditional Generative Data-Free Knowledge Distillation based on
Attention Transfer [0.8594140167290099]
We propose a conditional generative data-free knowledge distillation (CGDD) framework to train efficient portable network without any real data.
In this framework, except using the knowledge extracted from teacher model, we introduce preset labels as additional auxiliary information.
We show that trained portable network learned with proposed data-free distillation method obtains 99.63%, 99.07% and 99.84% relative accuracy on CIFAR10, CIFAR100 and Caltech101.
arXiv Detail & Related papers (2021-12-31T09:23:40Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Self Regulated Learning Mechanism for Data Efficient Knowledge
Distillation [8.09591217280048]
A novel data-efficient approach to transfer the knowledge from a teacher model to a student model is presented.
The teacher model uses self-regulation to select appropriate samples for training and identifies their significance in the process.
During distillation, the significance information can be used along with the soft-targets to supervise the students.
arXiv Detail & Related papers (2021-02-14T10:43:13Z) - Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge
Transfer [15.499267533387039]
The proposed method has been devoted to both lightweight image classification and encoder-decoder architectures to boost the performance of small and compact models without incurring extra computational overhead at the inference process.
The obtained results show that the proposed model has achieved significant improvement over earlier ideas of self-distillation methods.
arXiv Detail & Related papers (2020-10-09T11:57:45Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z) - Residual Knowledge Distillation [96.18815134719975]
This work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A)
In this way, S is trained to mimic the feature maps of T, and A aids this process by learning the residual error between them.
Experiments show that our approach achieves appealing results on popular classification datasets, CIFAR-100 and ImageNet.
arXiv Detail & Related papers (2020-02-21T07:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.