NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation
- URL: http://arxiv.org/abs/2310.00258v2
- Date: Fri, 22 Mar 2024 01:46:44 GMT
- Title: NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation
- Authors: Minh-Tuan Tran, Trung Le, Xuan-May Le, Mehrtash Harandi, Quan Hung Tran, Dinh Phung,
- Abstract summary: Data-Free Knowledge Distillation (DFKD) has made significant recent strides by transferring knowledge from a teacher neural network to a student neural network without accessing the original data.
Existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information.
We propose a novel Noisy Layer Generation method (NAYER) which relocates the random source from the input to a noisy layer and utilizes the meaningful constant label-text embedding (LTE) as the input.
- Score: 42.435293471992274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data-Free Knowledge Distillation (DFKD) has made significant recent strides by transferring knowledge from a teacher neural network to a student neural network without accessing the original data. Nonetheless, existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information. Consequently, these models struggle to effectively map this noise to the ground-truth sample distribution, resulting in prolonging training times and low-quality outputs. In this paper, we propose a novel Noisy Layer Generation method (NAYER) which relocates the random source from the input to a noisy layer and utilizes the meaningful constant label-text embedding (LTE) as the input. LTE is generated by using the language model once, and then it is stored in memory for all subsequent training processes. The significance of LTE lies in its ability to contain substantial meaningful inter-class information, enabling the generation of high-quality samples with only a few training steps. Simultaneously, the noisy layer plays a key role in addressing the issue of diversity in sample generation by preventing the model from overemphasizing the constrained label information. By reinitializing the noisy layer in each iteration, we aim to facilitate the generation of diverse samples while still retaining the method's efficiency, thanks to the ease of learning provided by LTE. Experiments carried out on multiple datasets demonstrate that our NAYER not only outperforms the state-of-the-art methods but also achieves speeds 5 to 15 times faster than previous approaches. The code is available at https://github.com/tmtuan1307/nayer.
Related papers
- Data Stream Sampling with Fuzzy Task Boundaries and Noisy Labels [0.03464344220266879]
We introduce a novel sampling method called Noisy Test Debiasing (NTD) to mitigate noisy labels in evolving data streams.
NTD is straightforward to implement, making it feasible across various scenarios.
The results validate the efficacy of NTD for online continual learning in scenarios with noisy labels in data streams.
arXiv Detail & Related papers (2024-04-07T08:32:16Z) - DIDA: Denoised Imitation Learning based on Domain Adaptation [28.36684781402964]
We focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise.
We propose Denoised Imitation learning based on Domain Adaptation (DIDA), which designs two discriminators to distinguish the noise level and expertise level of data.
Experiment results on MuJoCo demonstrate that DIDA can successfully handle challenging imitation tasks from demonstrations with various types of noise, outperforming most baseline methods.
arXiv Detail & Related papers (2024-04-04T11:29:05Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Towards Harnessing Feature Embedding for Robust Learning with Noisy
Labels [44.133307197696446]
The memorization effect of deep neural networks (DNNs) plays a pivotal role in recent label noise learning methods.
We propose a novel feature embedding-based method for deep learning with label noise, termed LabEl NoiseDilution (LEND)
arXiv Detail & Related papers (2022-06-27T02:45:09Z) - Robust Meta-learning with Sampling Noise and Label Noise via
Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples.
When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise.
We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z) - INN: A Method Identifying Clean-annotated Samples via Consistency Effect
in Deep Neural Networks [1.1470070927586016]
We introduce a new method called INN to refine clean labeled data from training data with noisy labels.
The INN method requires more computation but is much stable and powerful than the small-loss strategy.
arXiv Detail & Related papers (2021-06-29T09:06:21Z) - Learning from Noisy Labels via Dynamic Loss Thresholding [69.61904305229446]
We propose a novel method named Dynamic Loss Thresholding (DLT)
During the training process, DLT records the loss value of each sample and calculates dynamic loss thresholds.
Experiments on CIFAR-10/100 and Clothing1M demonstrate substantial improvements over recent state-of-the-art methods.
arXiv Detail & Related papers (2021-04-01T07:59:03Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.