Dual Discriminator Adversarial Distillation for Data-free Model
Compression
- URL: http://arxiv.org/abs/2104.05382v1
- Date: Mon, 12 Apr 2021 12:01:45 GMT
- Title: Dual Discriminator Adversarial Distillation for Data-free Model
Compression
- Authors: Haoran Zhao, Xin Sun, Junyu Dong, Hui Yu and Huiyu Zhou
- Abstract summary: We propose Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data.
To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data.
The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data.
- Score: 36.49964835173507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation has been widely used to produce portable and efficient
neural networks which can be well applied on edge devices for computer vision
tasks. However, almost all top-performing knowledge distillation methods need
to access the original training data, which usually has a huge size and is
often unavailable. To tackle this problem, we propose a novel data-free
approach in this paper, named Dual Discriminator Adversarial Distillation
(DDAD) to distill a neural network without any training data or meta-data. To
be specific, we use a generator to create samples through dual discriminator
adversarial distillation, which mimics the original training data. The
generator not only uses the pre-trained teacher's intrinsic statistics in
existing batch normalization layers but also obtains the maximum discrepancy
from the student model. Then the generated samples are used to train the
compact student network under the supervision of the teacher. The proposed
method obtains an efficient student network which closely approximates its
teacher network, despite using no original training data. Extensive experiments
are conducted to to demonstrate the effectiveness of the proposed approach on
CIFAR-10, CIFAR-100 and Caltech101 datasets for classification tasks. Moreover,
we extend our method to semantic segmentation tasks on several public datasets
such as CamVid and NYUv2. All experiments show that our method outperforms all
baselines for data-free knowledge distillation.
Related papers
- Learning Privacy-Preserving Student Networks via Discriminative-Generative Distillation [24.868697898254368]
Deep models may pose a privacy leakage risk in practical deployment.
We propose a discriminative-generative distillation approach to learn privacy-preserving deep models.
Our approach can control query cost over private data and accuracy degradation in a unified manner.
arXiv Detail & Related papers (2024-09-04T03:06:13Z) - Small Scale Data-Free Knowledge Distillation [37.708282211941416]
We propose Small Scale Data-free Knowledge Distillation SSD-KD.
SSD-KD balances synthetic samples and a priority sampling function to select proper samples.
It can perform distillation training conditioned on an extremely small scale of synthetic samples.
arXiv Detail & Related papers (2024-06-12T05:09:41Z) - Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$)
We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network.
We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z) - Conditional Generative Data-Free Knowledge Distillation based on
Attention Transfer [0.8594140167290099]
We propose a conditional generative data-free knowledge distillation (CGDD) framework to train efficient portable network without any real data.
In this framework, except using the knowledge extracted from teacher model, we introduce preset labels as additional auxiliary information.
We show that trained portable network learned with proposed data-free distillation method obtains 99.63%, 99.07% and 99.84% relative accuracy on CIFAR10, CIFAR100 and Caltech101.
arXiv Detail & Related papers (2021-12-31T09:23:40Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Large-Scale Generative Data-Free Distillation [17.510996270055184]
We propose a new method to train a generative image model by leveraging the intrinsic normalization layers' statistics.
The proposed method pushes forward the data-free distillation performance on CIFAR-10 and CIFAR-100 to 95.02% and 77.02% respectively.
We are able to scale it to ImageNet dataset, which to the best of our knowledge, has never been done using generative models in a data-free setting.
arXiv Detail & Related papers (2020-12-10T10:54:38Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN [80.17705319689139]
We propose a data-free knowledge amalgamate strategy to craft a well-behaved multi-task student network from multiple single/multi-task teachers.
The proposed method without any training data achieves the surprisingly competitive results, even compared with some full-supervised methods.
arXiv Detail & Related papers (2020-03-20T03:20:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.