Contrastive Model Inversion for Data-Free Knowledge Distillation
- URL: http://arxiv.org/abs/2105.08584v1
- Date: Tue, 18 May 2021 15:13:00 GMT
- Title: Contrastive Model Inversion for Data-Free Knowledge Distillation
- Authors: Gongfan Fang, Jie Song, Xinchao Wang, Chengchao Shen, Xingen Wang,
Mingli Song
- Abstract summary: We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
- Score: 60.08025054715192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model inversion, whose goal is to recover training data from a pre-trained
model, has been recently proved feasible. However, existing inversion methods
usually suffer from the mode collapse problem, where the synthesized instances
are highly similar to each other and thus show limited effectiveness for
downstream tasks, such as knowledge distillation. In this paper, we propose
Contrastive Model Inversion~(CMI), where the data diversity is explicitly
modeled as an optimizable objective, to alleviate the mode collapse issue. Our
main observation is that, under the constraint of the same amount of data,
higher data diversity usually indicates stronger instance discrimination. To
this end, we introduce in CMI a contrastive learning objective that encourages
the synthesizing instances to be distinguishable from the already synthesized
ones in previous batches. Experiments of pre-trained models on CIFAR-10,
CIFAR-100, and Tiny-ImageNet demonstrate that CMI not only generates more
visually plausible instances than the state of the arts, but also achieves
significantly superior performance when the generated data are used for
knowledge distillation. Code is available at
\url{https://github.com/zju-vipa/DataFree}.
Related papers
- Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences [20.629333587044012]
We study the impact of data curation on iterated retraining of generative models.
We prove that, if the data is curated according to a reward model, the expected reward of the iterative retraining procedure is maximized.
arXiv Detail & Related papers (2024-06-12T21:28:28Z) - Self-supervised Dataset Distillation: A Good Compression Is All You Need [23.02066055996762]
We introduce SC-DD, a simple yet effective Self-supervised Compression framework for dataset distillation.
The proposed SC-DD outperforms all previous state-of-the-art supervised dataset distillation methods when employing larger models.
Experiments are conducted on CIFAR-100, Tiny-ImageNet and ImageNet-1K datasets to demonstrate the superiority of our proposed approach.
arXiv Detail & Related papers (2024-04-11T17:56:40Z) - SCME: A Self-Contrastive Method for Data-free and Query-Limited Model
Extraction Attack [18.998300969035885]
Model extraction attacks fool the target model by generating adversarial examples on a substitute model.
We propose a novel data-free model extraction method named SCME, which considers both the inter- and intra-class diversity in synthesizing fake data.
arXiv Detail & Related papers (2023-10-15T10:41:45Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Maintaining Stability and Plasticity for Predictive Churn Reduction [8.971668467496055]
We propose a solution called Accumulated Model Combination (AMC)
AMC is a general technique and we propose several instances of it, each having their own advantages depending on the model and data properties.
arXiv Detail & Related papers (2023-05-06T20:56:20Z) - ChiroDiff: Modelling chirographic data with Diffusion Models [132.5223191478268]
We introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data.
Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate.
arXiv Detail & Related papers (2023-04-07T15:17:48Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Disentangled Recurrent Wasserstein Autoencoder [17.769077848342334]
recurrent Wasserstein Autoencoder (R-WAE) is a new framework for generative modeling of sequential data.
R-WAE disentangles the representation of an input sequence into static and dynamic factors.
Our models outperform other baselines with the same settings in terms of disentanglement and unconditional video generation.
arXiv Detail & Related papers (2021-01-19T07:43:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.