Sampling to Distill: Knowledge Transfer from Open-World Data
- URL: http://arxiv.org/abs/2307.16601v2
- Date: Sun, 21 Jul 2024 14:32:53 GMT
- Title: Sampling to Distill: Knowledge Transfer from Open-World Data
- Authors: Yuzheng Wang, Zhaoyu Chen, Jie Zhang, Dingkang Yang, Zuhao Ge, Yang Liu, Siao Liu, Yunquan Sun, Wenqiang Zhang, Lizhe Qi,
- Abstract summary: We propose a novel Open-world Data Sampling Distillation (ODSD) method for the Data-Free Knowledge Distillation (DFKD) task without the redundant generation process.
First, we try to sample open-world data close to the original data's distribution by an adaptive sampling module.
Then, we build structured relationships of multiple data examples to exploit data knowledge through the student model itself and the teacher's structured representation.
- Score: 28.74835717488114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data-Free Knowledge Distillation (DFKD) is a novel task that aims to train high-performance student models using only the pre-trained teacher network without original training data. Most of the existing DFKD methods rely heavily on additional generation modules to synthesize the substitution data resulting in high computational costs and ignoring the massive amounts of easily accessible, low-cost, unlabeled open-world data. Meanwhile, existing methods ignore the domain shift issue between the substitution data and the original data, resulting in knowledge from teachers not always trustworthy and structured knowledge from data becoming a crucial supplement. To tackle the issue, we propose a novel Open-world Data Sampling Distillation (ODSD) method for the DFKD task without the redundant generation process. First, we try to sample open-world data close to the original data's distribution by an adaptive sampling module and introduce a low-noise representation to alleviate the domain shift issue. Then, we build structured relationships of multiple data examples to exploit data knowledge through the student model itself and the teacher's structured representation. Extensive experiments on CIFAR-10, CIFAR-100, NYUv2, and ImageNet show that our ODSD method achieves state-of-the-art performance with lower FLOPs and parameters. Especially, we improve 1.50\%-9.59\% accuracy on the ImageNet dataset and avoid training the separate generator for each class.
Related papers
- Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation [20.556083321381514]
Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain of model compression.
This paper introduces an innovative approach to DFKD through diverse diffusion augmentation (DDA)
Comprehensive experiments conducted on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets showcase the superior performance of our method.
arXiv Detail & Related papers (2024-10-23T07:01:16Z) - De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts [32.1016787150064]
Data-Free Knowledge Distillation (DFKD) is a promising task to train high-performance small models to enhance actual deployment without relying on the original training data.
Existing methods commonly avoid relying on private data by utilizing synthetic or sampled data.
This paper proposes a novel perspective with causal inference to disentangle the student models from the impact of such shifts.
arXiv Detail & Related papers (2024-03-28T16:13:22Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Unlocking the Potential of Federated Learning: The Symphony of Dataset
Distillation via Deep Generative Latents [43.282328554697564]
We propose a highly efficient FL dataset distillation framework on the server side.
Unlike previous strategies, our technique enables the server to leverage prior knowledge from pre-trained deep generative models.
Our framework converges faster than the baselines because rather than the server trains on several sets of heterogeneous data distributions, it trains on a multi-modal distribution.
arXiv Detail & Related papers (2023-12-03T23:30:48Z) - Lightweight Self-Knowledge Distillation with Multi-source Information
Fusion [3.107478665474057]
Knowledge Distillation (KD) is a powerful technique for transferring knowledge between neural network models.
We propose a lightweight SKD framework that utilizes multi-source information to construct a more informative teacher.
We validate the performance of the proposed DRG, DSR, and their combination through comprehensive experiments on various datasets and models.
arXiv Detail & Related papers (2023-05-16T05:46:31Z) - Prompting to Distill: Boosting Data-Free Knowledge Distillation via
Reinforced Prompt [52.6946016535059]
Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data.
We propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors.
As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance.
arXiv Detail & Related papers (2022-05-16T08:56:53Z) - Up to 100x Faster Data-free Knowledge Distillation [52.666615987503995]
We introduce FastDFKD, which allows us to accelerate DFKD by a factor of orders of magnitude.
Unlike prior methods that optimize a set of data independently, we propose to learn a meta-synthesizer that seeks common features.
FastDFKD achieves data synthesis within only a few steps, significantly enhancing the efficiency of data-free training.
arXiv Detail & Related papers (2021-12-12T14:56:58Z) - Dual Discriminator Adversarial Distillation for Data-free Model
Compression [36.49964835173507]
We propose Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data.
To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data.
The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data.
arXiv Detail & Related papers (2021-04-12T12:01:45Z) - Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation [55.34995029082051]
We propose a method to learn to augment for data-scarce domain BERT knowledge distillation.
We show that the proposed method significantly outperforms state-of-the-art baselines on four different tasks.
arXiv Detail & Related papers (2021-01-20T13:07:39Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.