What Makes a Good Dataset for Knowledge Distillation?
- URL: http://arxiv.org/abs/2411.12817v1
- Date: Tue, 19 Nov 2024 19:10:12 GMT
- Title: What Makes a Good Dataset for Knowledge Distillation?
- Authors: Logan Frank, Jim Davis,
- Abstract summary: Knowledge distillation (KD) has been a popular and effective method for model compression.
One important assumption of KD is that the teacher's original dataset will also be available when training the student.
In situations such as continual learning and distilling large models trained on company-withheld datasets, having access to the original data may not always be possible.
- Score: 4.604003661048267
- License:
- Abstract: Knowledge distillation (KD) has been a popular and effective method for model compression. One important assumption of KD is that the teacher's original dataset will also be available when training the student. However, in situations such as continual learning and distilling large models trained on company-withheld datasets, having access to the original data may not always be possible. This leads practitioners towards utilizing other sources of supplemental data, which could yield mixed results. One must then ask: "what makes a good dataset for transferring knowledge from teacher to student?" Many would assume that only real in-domain imagery is viable, but is that the only option? In this work, we explore multiple possible surrogate distillation datasets and demonstrate that many different datasets, even unnatural synthetic imagery, can serve as a suitable alternative in KD. From examining these alternative datasets, we identify and present various criteria describing what makes a good dataset for distillation. Source code will be available in the future.
Related papers
- What is Dataset Distillation Learning? [32.99890244958794]
We study the behavior, representativeness, and point-wise information content of distilled data.
We reveal distilled data cannot serve as a substitute for real data during training.
We provide an framework for interpreting distilled data and reveal that individual distilled data points contain meaningful semantic information.
arXiv Detail & Related papers (2024-06-06T17:28:56Z) - Data-Free Knowledge Distillation Using Adversarially Perturbed OpenGL
Shader Images [5.439020425819001]
Knowledge distillation (KD) has been a popular and effective method for model compression.
"Data-free" KD has emerged as a growing research topic which focuses on the scenario of performing KD when no data is provided.
We propose a new approach to data-free KD that utilizes unnatural images, combined with large amounts of data augmentation and adversarial attacks.
arXiv Detail & Related papers (2023-10-20T19:28:50Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Unbiased Knowledge Distillation for Recommendation [66.82575287129728]
Knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency.
Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge to supervise the learning of a compact student model.
We find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation.
arXiv Detail & Related papers (2022-11-27T05:14:03Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Black-box Few-shot Knowledge Distillation [55.27881513982002]
Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network.
We propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher.
We conduct extensive experiments to show that our method significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks.
arXiv Detail & Related papers (2022-07-25T12:16:53Z) - Unified and Effective Ensemble Knowledge Distillation [92.67156911466397]
Ensemble knowledge distillation can extract knowledge from multiple teacher models and encode it into a single student model.
Many existing methods learn and distill the student model on labeled data only.
We propose a unified and effective ensemble knowledge distillation method that distills a single student model from an ensemble of teacher models on both labeled and unlabeled data.
arXiv Detail & Related papers (2022-04-01T16:15:39Z) - Large-Scale Generative Data-Free Distillation [17.510996270055184]
We propose a new method to train a generative image model by leveraging the intrinsic normalization layers' statistics.
The proposed method pushes forward the data-free distillation performance on CIFAR-10 and CIFAR-100 to 95.02% and 77.02% respectively.
We are able to scale it to ImageNet dataset, which to the best of our knowledge, has never been done using generative models in a data-free setting.
arXiv Detail & Related papers (2020-12-10T10:54:38Z) - Role-Wise Data Augmentation for Knowledge Distillation [48.115719640111394]
Knowledge Distillation (KD) is a common method for transferring the knowledge'' learned by one machine learning model into another.
We design data augmentation agents with distinct roles to facilitate knowledge distillation.
We find empirically that specially tailored data points enable the teacher's knowledge to be demonstrated more effectively to the student.
arXiv Detail & Related papers (2020-04-19T14:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.