Feature-map-level Online Adversarial Knowledge Distillation
- URL: http://arxiv.org/abs/2002.01775v3
- Date: Fri, 5 Jun 2020 18:15:40 GMT
- Title: Feature-map-level Online Adversarial Knowledge Distillation
- Authors: Inseop Chung, SeongUk Park, Jangho Kim, Nojun Kwak
- Abstract summary: We propose an online knowledge distillation method that transfers not only the knowledge of the class probabilities but also that of the feature map.
We train multiple networks simultaneously by employing discriminators to distinguish the feature map distributions of different networks.
We show that our method performs better than the conventional direct alignment method such as L1 and is more suitable for online distillation.
- Score: 36.42703358752526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature maps contain rich information about image intensity and spatial
correlation. However, previous online knowledge distillation methods only
utilize the class probabilities. Thus in this paper, we propose an online
knowledge distillation method that transfers not only the knowledge of the
class probabilities but also that of the feature map using the adversarial
training framework. We train multiple networks simultaneously by employing
discriminators to distinguish the feature map distributions of different
networks. Each network has its corresponding discriminator which discriminates
the feature map from its own as fake while classifying that of the other
network as real. By training a network to fool the corresponding discriminator,
it can learn the other network's feature map distribution. We show that our
method performs better than the conventional direct alignment method such as L1
and is more suitable for online distillation. Also, we propose a novel cyclic
learning scheme for training more than two networks together. We have applied
our method to various network architectures on the classification task and
discovered a significant improvement of performance especially in the case of
training a pair of a small network and a large one.
Related papers
- Direct Distillation between Different Domains [97.39470334253163]
We propose a new one-stage method dubbed Direct Distillation between Different Domains" (4Ds)
We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge.
We then build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network.
arXiv Detail & Related papers (2024-01-12T02:48:51Z) - Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$)
We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network.
We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z) - Self-Supervised Learning for Binary Networks by Joint Classifier
Training [11.612308609123566]
We propose a self-supervised learning method for binary networks.
For better training of the binary network, we propose a feature similarity loss, a dynamic balancing scheme of loss terms, and modified multi-stage training.
Our empirical validations show that BSSL outperforms self-supervised learning baselines for binary networks in various downstream tasks and outperforms supervised pretraining in certain tasks.
arXiv Detail & Related papers (2021-10-17T15:38:39Z) - Unsupervised Domain-adaptive Hash for Networks [81.49184987430333]
Domain-adaptive hash learning has enjoyed considerable success in the computer vision community.
We develop an unsupervised domain-adaptive hash learning method for networks, dubbed UDAH.
arXiv Detail & Related papers (2021-08-20T12:09:38Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge
Distillation [12.097302014936655]
This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD)
Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation.
We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets.
arXiv Detail & Related papers (2021-03-15T10:59:43Z) - Feature Sharing Cooperative Network for Semantic Segmentation [10.305130700118399]
We propose a semantic segmentation method using cooperative learning.
By sharing feature maps, one of two networks can obtain the information that cannot be obtained by a single network.
The proposed method achieved better segmentation accuracy than the conventional single network and ensemble of networks.
arXiv Detail & Related papers (2021-01-20T00:22:00Z) - ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image
Classification [49.87503122462432]
We introduce a novel neural network termed Relation-and-Margin learning Network (ReMarNet)
Our method assembles two networks of different backbones so as to learn the features that can perform excellently in both of the aforementioned two classification mechanisms.
Experiments on four image datasets demonstrate that our approach is effective in learning discriminative features from a small set of labeled samples.
arXiv Detail & Related papers (2020-06-27T13:50:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.