Data-Free Knowledge Distillation Using Adversarially Perturbed OpenGL
Shader Images
- URL: http://arxiv.org/abs/2310.13782v1
- Date: Fri, 20 Oct 2023 19:28:50 GMT
- Title: Data-Free Knowledge Distillation Using Adversarially Perturbed OpenGL
Shader Images
- Authors: Logan Frank and Jim Davis
- Abstract summary: Knowledge distillation (KD) has been a popular and effective method for model compression.
"Data-free" KD has emerged as a growing research topic which focuses on the scenario of performing KD when no data is provided.
We propose a new approach to data-free KD that utilizes unnatural images, combined with large amounts of data augmentation and adversarial attacks.
- Score: 5.439020425819001
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation (KD) has been a popular and effective method for model
compression. One important assumption of KD is that the original training
dataset is always available. However, this is not always the case due to
privacy concerns and more. In recent years, "data-free" KD has emerged as a
growing research topic which focuses on the scenario of performing KD when no
data is provided. Many methods rely on a generator network to synthesize
examples for distillation (which can be difficult to train) and can frequently
produce images that are visually similar to the original dataset, which raises
questions surrounding whether privacy is completely preserved. In this work, we
propose a new approach to data-free KD that utilizes unnatural OpenGL images,
combined with large amounts of data augmentation and adversarial attacks, to
train a student network. We demonstrate that our approach achieves
state-of-the-art results for a variety of datasets/networks and is more stable
than existing generator-based data-free KD methods. Source code will be
available in the future.
Related papers
- What Makes a Good Dataset for Knowledge Distillation? [4.604003661048267]
Knowledge distillation (KD) has been a popular and effective method for model compression.
One important assumption of KD is that the teacher's original dataset will also be available when training the student.
In situations such as continual learning and distilling large models trained on company-withheld datasets, having access to the original data may not always be possible.
arXiv Detail & Related papers (2024-11-19T19:10:12Z) - Condensed Sample-Guided Model Inversion for Knowledge Distillation [42.91823325342862]
Knowledge distillation (KD) is a key element in neural network compression that allows knowledge transfer from a pre-trained teacher model to a more compact student model.
KD relies on access to the training dataset, which may not always be fully available due to privacy concerns or logistical issues related to the size of the data.
In this paper, we consider condensed samples as a form of supplementary information, and introduce a method for using them to better approximate the target data distribution.
arXiv Detail & Related papers (2024-08-25T14:43:27Z) - Wakening Past Concepts without Past Data: Class-Incremental Learning
from Online Placebos [85.37515663416691]
We find that "using new class data for KD" not only hinders the model adaption (for learning new classes) but also results in low efficiency for preserving old class knowledge.
We address this by "using the placebos of old classes for KD", where the placebos are chosen from a free image stream, such as Google Images, in an automatical and economical fashion.
arXiv Detail & Related papers (2023-10-24T18:32:46Z) - Revisiting Data-Free Knowledge Distillation with Poisoned Teachers [47.513721590643435]
Data-free knowledge distillation (KD) helps transfer knowledge from a pre-trained model to a smaller model (known as the student model) without access to the original training data used for training the teacher model.
However, the security of the synthetic or out-of-distribution (OOD) data required in data-free KD is largely unknown and under-explored.
We propose Anti-Backdoor Data-Free KD, the first plug-in defensive method for data-free KD methods to mitigate the chance of potential backdoors being transferred.
arXiv Detail & Related papers (2023-06-04T14:27:50Z) - VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from
Small Scale to Large Scale [55.97546756258374]
We show that employing stronger data augmentation techniques and using larger datasets can directly decrease the gap between vanilla KD and other meticulously designed KD variants.
Our investigation of the vanilla KD and its variants in more complex schemes, including stronger training strategies and different model capacities, demonstrates that vanilla KD is elegantly simple but astonishingly effective in large-scale scenarios.
arXiv Detail & Related papers (2023-05-25T06:50:08Z) - Black-box Few-shot Knowledge Distillation [55.27881513982002]
Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network.
We propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher.
We conduct extensive experiments to show that our method significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks.
arXiv Detail & Related papers (2022-07-25T12:16:53Z) - Undistillable: Making A Nasty Teacher That CANNOT teach students [84.6111281091602]
This paper introduces and investigates a concept called Nasty Teacher: a specially trained teacher network that yields nearly the same performance as a normal one.
We propose a simple yet effective algorithm to build the nasty teacher, called self-undermining knowledge distillation.
arXiv Detail & Related papers (2021-05-16T08:41:30Z) - Large-Scale Generative Data-Free Distillation [17.510996270055184]
We propose a new method to train a generative image model by leveraging the intrinsic normalization layers' statistics.
The proposed method pushes forward the data-free distillation performance on CIFAR-10 and CIFAR-100 to 95.02% and 77.02% respectively.
We are able to scale it to ImageNet dataset, which to the best of our knowledge, has never been done using generative models in a data-free setting.
arXiv Detail & Related papers (2020-12-10T10:54:38Z) - Knowledge Distillation Thrives on Data Augmentation [65.58705111863814]
Knowledge distillation (KD) is a general deep neural network training framework that uses a teacher model to guide a student model.
Many works have explored the rationale for its success, however, its interplay with data augmentation (DA) has not been well recognized so far.
In this paper, we are motivated by an interesting observation in classification: KD loss can benefit from extended training iterations while the cross-entropy loss does not.
We show this disparity arises because of data augmentation: KD loss can tap into the extra information from different input views brought by DA.
arXiv Detail & Related papers (2020-12-05T00:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.