Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge
Distillation
- URL: http://arxiv.org/abs/2011.09113v1
- Date: Wed, 18 Nov 2020 06:33:20 GMT
- Title: Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge
Distillation
- Authors: Gaurav Kumar Nayak, Konda Reddy Mopuri, Anirban Chakraborty
- Abstract summary: We investigate the effectiveness of "arbitrary transfer sets" such as random noise, publicly available synthetic, and natural datasets.
We find surprising effectiveness of using arbitrary data to conduct knowledge distillation when this dataset is "target-class balanced"
- Score: 28.874162427052905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge Distillation is an effective method to transfer the learning across
deep neural networks. Typically, the dataset originally used for training the
Teacher model is chosen as the "Transfer Set" to conduct the knowledge transfer
to the Student. However, this original training data may not always be freely
available due to privacy or sensitivity concerns. In such scenarios, existing
approaches either iteratively compose a synthetic set representative of the
original training dataset, one sample at a time or learn a generative model to
compose such a transfer set. However, both these approaches involve complex
optimization (GAN training or several backpropagation steps to synthesize one
sample) and are often computationally expensive. In this paper, as a simple
alternative, we investigate the effectiveness of "arbitrary transfer sets" such
as random noise, publicly available synthetic, and natural datasets, all of
which are completely unrelated to the original training dataset in terms of
their visual or semantic contents. Through extensive experiments on multiple
benchmark datasets such as MNIST, FMNIST, CIFAR-10 and CIFAR-100, we discover
and validate surprising effectiveness of using arbitrary data to conduct
knowledge distillation when this dataset is "target-class balanced". We believe
that this important observation can potentially lead to designing baselines for
the data-free knowledge distillation task.
Related papers
- Group Distributionally Robust Dataset Distillation with Risk
Minimization [18.07189444450016]
We introduce an algorithm that combines clustering with the minimization of a risk measure on the loss to conduct DD.
We demonstrate its effective generalization and robustness across subgroups through numerical experiments.
arXiv Detail & Related papers (2024-02-07T09:03:04Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Data Distillation Can Be Like Vodka: Distilling More Times For Better
Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance.
PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets.
Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z) - Exploring Data Redundancy in Real-world Image Classification through
Data Selection [20.389636181891515]
Deep learning models often require large amounts of data for training, leading to increased costs.
We present two data valuation metrics based on Synaptic Intelligence and gradient norms, respectively, to study redundancy in real-world image data.
Online and offline data selection algorithms are then proposed via clustering and grouping based on the examined data values.
arXiv Detail & Related papers (2023-06-25T03:31:05Z) - Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data.
Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data.
Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Data-Free Knowledge Distillation with Soft Targeted Transfer Set
Synthesis [8.87104231451079]
Knowledge distillation (KD) has proved to be an effective approach for deep neural network compression.
In traditional KD, the transferred knowledge is usually obtained by feeding training samples to the teacher network.
The original training dataset is not always available due to storage costs or privacy issues.
We propose a novel data-free KD approach by modeling the intermediate feature space of the teacher.
arXiv Detail & Related papers (2021-04-10T22:42:14Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.