Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images
- URL: http://arxiv.org/abs/2307.11469v1
- Date: Fri, 21 Jul 2023 10:08:58 GMT
- Title: Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images
- Authors: Jialiang Tang, Shuo Chen, Gang Niu, Masashi Sugiyama, Chen Gong
- Abstract summary: We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$)
We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network.
We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
- Score: 91.66661969598755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation aims to learn a lightweight student network from a
pre-trained teacher network. In practice, existing knowledge distillation
methods are usually infeasible when the original training data is unavailable
due to some privacy issues and data management considerations. Therefore,
data-free knowledge distillation approaches proposed to collect training
instances from the Internet. However, most of them have ignored the common
distribution shift between the instances from original training data and webly
collected data, affecting the reliability of the trained student network. To
solve this problem, we propose a novel method dubbed ``Knowledge Distillation
between Different Distributions" (KD$^{3}$), which consists of three
components. Specifically, we first dynamically select useful training instances
from the webly collected data according to the combined predictions of teacher
network and student network. Subsequently, we align both the weighted features
and classifier parameters of the two networks for knowledge memorization.
Meanwhile, we also build a new contrastive learning block called
MixDistribution to generate perturbed data with a new distribution for instance
alignment, so that the student network can further learn a
distribution-invariant representation. Intensive experiments on various
benchmark datasets demonstrate that our proposed KD$^{3}$ can outperform the
state-of-the-art data-free knowledge distillation approaches.
Related papers
- Small Scale Data-Free Knowledge Distillation [37.708282211941416]
We propose Small Scale Data-free Knowledge Distillation SSD-KD.
SSD-KD balances synthetic samples and a priority sampling function to select proper samples.
It can perform distillation training conditioned on an extremely small scale of synthetic samples.
arXiv Detail & Related papers (2024-06-12T05:09:41Z) - Direct Distillation between Different Domains [97.39470334253163]
We propose a new one-stage method dubbed Direct Distillation between Different Domains" (4Ds)
We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge.
We then build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network.
arXiv Detail & Related papers (2024-01-12T02:48:51Z) - Learning to Retain while Acquiring: Combating Distribution-Shift in
Adversarial Data-Free Knowledge Distillation [31.294947552032088]
Data-free Knowledge Distillation (DFKD) has gained popularity recently, with the fundamental idea of carrying out knowledge transfer from a Teacher to a Student neural network in the absence of training data.
We propose a meta-learning inspired framework by treating the task of Knowledge-Acquisition (learning from newly generated samples) and Knowledge-Retention (retaining knowledge on previously met samples) as meta-train and meta-test.
arXiv Detail & Related papers (2023-02-28T03:50:56Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z) - BatchFormer: Learning to Explore Sample Relationships for Robust
Representation Learning [93.38239238988719]
We propose to enable deep neural networks with the ability to learn the sample relationships from each mini-batch.
BatchFormer is applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training.
We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications.
arXiv Detail & Related papers (2022-03-03T05:31:33Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Dual Discriminator Adversarial Distillation for Data-free Model
Compression [36.49964835173507]
We propose Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data.
To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data.
The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data.
arXiv Detail & Related papers (2021-04-12T12:01:45Z) - Data-Free Knowledge Distillation with Soft Targeted Transfer Set
Synthesis [8.87104231451079]
Knowledge distillation (KD) has proved to be an effective approach for deep neural network compression.
In traditional KD, the transferred knowledge is usually obtained by feeding training samples to the teacher network.
The original training dataset is not always available due to storage costs or privacy issues.
We propose a novel data-free KD approach by modeling the intermediate feature space of the teacher.
arXiv Detail & Related papers (2021-04-10T22:42:14Z) - Data-free Knowledge Distillation for Segmentation using Data-Enriching
GAN [0.0]
We propose a new training framework for performing knowledge distillation in a data-free setting.
We get an improvement of 6.93% in Mean IoU over previous approaches.
arXiv Detail & Related papers (2020-11-02T08:16:42Z) - DivideMix: Learning with Noisy Labels as Semi-supervised Learning [111.03364864022261]
We propose DivideMix, a framework for learning with noisy labels.
Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods.
arXiv Detail & Related papers (2020-02-18T06:20:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.