Large-Scale Generative Data-Free Distillation
- URL: http://arxiv.org/abs/2012.05578v1
- Date: Thu, 10 Dec 2020 10:54:38 GMT
- Title: Large-Scale Generative Data-Free Distillation
- Authors: Liangchen Luo, Mark Sandler, Zi Lin, Andrey Zhmoginov, Andrew Howard
- Abstract summary: We propose a new method to train a generative image model by leveraging the intrinsic normalization layers' statistics.
The proposed method pushes forward the data-free distillation performance on CIFAR-10 and CIFAR-100 to 95.02% and 77.02% respectively.
We are able to scale it to ImageNet dataset, which to the best of our knowledge, has never been done using generative models in a data-free setting.
- Score: 17.510996270055184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation is one of the most popular and effective techniques
for knowledge transfer, model compression and semi-supervised learning. Most
existing distillation approaches require the access to original or augmented
training samples. But this can be problematic in practice due to privacy,
proprietary and availability concerns. Recent work has put forward some methods
to tackle this problem, but they are either highly time-consuming or unable to
scale to large datasets. To this end, we propose a new method to train a
generative image model by leveraging the intrinsic normalization layers'
statistics of the trained teacher network. This enables us to build an ensemble
of generators without training data that can efficiently produce substitute
inputs for subsequent distillation. The proposed method pushes forward the
data-free distillation performance on CIFAR-10 and CIFAR-100 to 95.02% and
77.02% respectively. Furthermore, we are able to scale it to ImageNet dataset,
which to the best of our knowledge, has never been done using generative models
in a data-free setting.
Related papers
- Data Distillation Can Be Like Vodka: Distilling More Times For Better
Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance.
PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets.
Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - DINOv2: Learning Robust Visual Features without Supervision [75.42921276202522]
This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources.
Most of the technical contributions aim at accelerating and stabilizing the training at scale.
In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature.
arXiv Detail & Related papers (2023-04-14T15:12:19Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z) - Conditional Generative Data-Free Knowledge Distillation based on
Attention Transfer [0.8594140167290099]
We propose a conditional generative data-free knowledge distillation (CGDD) framework to train efficient portable network without any real data.
In this framework, except using the knowledge extracted from teacher model, we introduce preset labels as additional auxiliary information.
We show that trained portable network learned with proposed data-free distillation method obtains 99.63%, 99.07% and 99.84% relative accuracy on CIFAR10, CIFAR100 and Caltech101.
arXiv Detail & Related papers (2021-12-31T09:23:40Z) - Dual Discriminator Adversarial Distillation for Data-free Model
Compression [36.49964835173507]
We propose Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data.
To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data.
The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data.
arXiv Detail & Related papers (2021-04-12T12:01:45Z) - Beyond Self-Supervision: A Simple Yet Effective Network Distillation
Alternative to Improve Backbones [40.33419553042038]
We propose to improve existing baseline networks via knowledge distillation from off-the-shelf pre-trained big powerful models.
Our solution performs distillation by only driving prediction of the student model consistent with that of the teacher model.
We empirically find that such simple distillation settings perform extremely effective, for example, the top-1 accuracy on ImageNet-1k validation set of MobileNetV3-large and ResNet50-D can be significantly improved.
arXiv Detail & Related papers (2021-03-10T09:32:44Z) - Enhancing Data-Free Adversarial Distillation with Activation
Regularization and Virtual Interpolation [19.778192371420793]
A data-free adversarial distillation framework deploys a generative network to transfer the teacher model's knowledge to the student model.
We add an activation regularizer and a virtual adversarial method to improve the data generation efficiency.
Our model's accuracy is 13.8% higher than the state-of-the-art data-free method on CIFAR-100.
arXiv Detail & Related papers (2021-02-23T11:37:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.