Related papers: Dual Discriminator Adversarial Distillation for Data-free Model Compression

Dual Discriminator Adversarial Distillation for Data-free Model Compression

URL: http://arxiv.org/abs/2104.05382v1
Date: Mon, 12 Apr 2021 12:01:45 GMT
Title: Dual Discriminator Adversarial Distillation for Data-free Model Compression
Authors: Haoran Zhao, Xin Sun, Junyu Dong, Hui Yu and Huiyu Zhou
Abstract summary: We propose Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data. To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data. The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data.
Score: 36.49964835173507
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge distillation has been widely used to produce portable and efficient neural networks which can be well applied on edge devices for computer vision tasks. However, almost all top-performing knowledge distillation methods need to access the original training data, which usually has a huge size and is often unavailable. To tackle this problem, we propose a novel data-free approach in this paper, named Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data. To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data. The generator not only uses the pre-trained teacher's intrinsic statistics in existing batch normalization layers but also obtains the maximum discrepancy from the student model. Then the generated samples are used to train the compact student network under the supervision of the teacher. The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data. Extensive experiments are conducted to to demonstrate the effectiveness of the proposed approach on CIFAR-10, CIFAR-100 and Caltech101 datasets for classification tasks. Moreover, we extend our method to semantic segmentation tasks on several public datasets such as CamVid and NYUv2. All experiments show that our method outperforms all baselines for data-free knowledge distillation.

Related papers

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Learning Privacy-Preserving Student Networks via Discriminative-Generative Distillation [24.868697898254368]
Deep models may pose a privacy leakage risk in practical deployment. We propose a discriminative-generative distillation approach to learn privacy-preserving deep models. Our approach can control query cost over private data and accuracy degradation in a unified manner.
arXiv Detail & Related papers (2024-09-04T03:06:13Z)
Small Scale Data-Free Knowledge Distillation [37.708282211941416]
We propose Small Scale Data-free Knowledge Distillation SSD-KD. SSD-KD balances synthetic samples and a priority sampling function to select proper samples. It can perform distillation training conditioned on an extremely small scale of synthetic samples.
arXiv Detail & Related papers (2024-06-12T05:09:41Z)
Distribution Shift Matters for Knowledge Distillation with Webly Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$) We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network. We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z)
Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions. Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z)
Learning to Generate Synthetic Training Data using Gradient Matching and Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks. Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z)
Conditional Generative Data-Free Knowledge Distillation based on Attention Transfer [0.8594140167290099]
We propose a conditional generative data-free knowledge distillation (CGDD) framework to train efficient portable network without any real data. In this framework, except using the knowledge extracted from teacher model, we introduce preset labels as additional auxiliary information. We show that trained portable network learned with proposed data-free distillation method obtains 99.63%, 99.07% and 99.84% relative accuracy on CIFAR10, CIFAR100 and Caltech101.
arXiv Detail & Related papers (2021-12-31T09:23:40Z)
Efficient training of lightweight neural networks using Online Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner. We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z)
Large-Scale Generative Data-Free Distillation [17.510996270055184]
We propose a new method to train a generative image model by leveraging the intrinsic normalization layers' statistics. The proposed method pushes forward the data-free distillation performance on CIFAR-10 and CIFAR-100 to 95.02% and 77.02% respectively. We are able to scale it to ImageNet dataset, which to the best of our knowledge, has never been done using generative models in a data-free setting.
arXiv Detail & Related papers (2020-12-10T10:54:38Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN [80.17705319689139]
We propose a data-free knowledge amalgamate strategy to craft a well-behaved multi-task student network from multiple single/multi-task teachers. The proposed method without any training data achieves the surprisingly competitive results, even compared with some full-supervised methods.
arXiv Detail & Related papers (2020-03-20T03:20:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.