Hierarchical Self-supervised Augmented Knowledge Distillation
- URL: http://arxiv.org/abs/2107.13715v1
- Date: Thu, 29 Jul 2021 02:57:21 GMT
- Title: Hierarchical Self-supervised Augmented Knowledge Distillation
- Authors: Chuanguang Yang, Zhulin An, Linhang Cai, Yongjun Xu
- Abstract summary: We propose an alternative self-supervised augmented task to guide the network to learn the joint distribution of the original recognition task and self-supervised auxiliary task.
It is demonstrated as a richer knowledge to improve the representation power without losing the normal classification capability.
Our method significantly surpasses the previous SOTA SSKD with an average improvement of 2.56% on CIFAR-100 and an improvement of 0.77% on ImageNet.
- Score: 1.9355744690301404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation often involves how to define and transfer knowledge
from teacher to student effectively. Although recent self-supervised
contrastive knowledge achieves the best performance, forcing the network to
learn such knowledge may damage the representation learning of the original
class recognition task. We therefore adopt an alternative self-supervised
augmented task to guide the network to learn the joint distribution of the
original recognition task and self-supervised auxiliary task. It is
demonstrated as a richer knowledge to improve the representation power without
losing the normal classification capability. Moreover, it is incomplete that
previous methods only transfer the probabilistic knowledge between the final
layers. We propose to append several auxiliary classifiers to hierarchical
intermediate feature maps to generate diverse self-supervised knowledge and
perform the one-to-one transfer to teach the student network thoroughly. Our
method significantly surpasses the previous SOTA SSKD with an average
improvement of 2.56\% on CIFAR-100 and an improvement of 0.77\% on ImageNet
across widely used network pairs. Codes are available at
https://github.com/winycg/HSAKD.
Related papers
- Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - Learning to Retain while Acquiring: Combating Distribution-Shift in
Adversarial Data-Free Knowledge Distillation [31.294947552032088]
Data-free Knowledge Distillation (DFKD) has gained popularity recently, with the fundamental idea of carrying out knowledge transfer from a Teacher to a Student neural network in the absence of training data.
We propose a meta-learning inspired framework by treating the task of Knowledge-Acquisition (learning from newly generated samples) and Knowledge-Retention (retaining knowledge on previously met samples) as meta-train and meta-test.
arXiv Detail & Related papers (2023-02-28T03:50:56Z) - Learning Knowledge Representation with Meta Knowledge Distillation for
Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task.
Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z) - Impact of a DCT-driven Loss in Attention-based Knowledge-Distillation
for Scene Recognition [64.29650787243443]
We propose and analyse the use of a 2D frequency transform of the activation maps before transferring them.
This strategy enhances knowledge transferability in tasks such as scene recognition.
We publicly release the training and evaluation framework used along this paper at http://www.vpu.eps.uam.es/publications/DCTBasedKDForSceneRecognition.
arXiv Detail & Related papers (2022-05-04T11:05:18Z) - Knowledge Distillation Using Hierarchical Self-Supervision Augmented
Distribution [1.7718093866806544]
We propose an auxiliary self-supervision augmented task to guide networks to learn more meaningful features.
Unlike previous knowledge, this distribution encodes joint knowledge from supervised and self-supervised feature learning.
We call our KD method as Hierarchical Self-Supervision Augmented Knowledge Distillation (HSSAKD)
arXiv Detail & Related papers (2021-09-07T13:29:32Z) - Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z) - Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge
Distillation [12.097302014936655]
This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD)
Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation.
We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets.
arXiv Detail & Related papers (2021-03-15T10:59:43Z) - Point Adversarial Self Mining: A Simple Method for Facial Expression
Recognition [79.75964372862279]
We propose Point Adversarial Self Mining (PASM) to improve the recognition accuracy in facial expression recognition.
PASM uses a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task.
The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively.
arXiv Detail & Related papers (2020-08-26T06:39:24Z) - Knowledge Transfer via Dense Cross-Layer Mutual-Distillation [24.24969126783315]
We propose Dense Cross-layer Mutual-distillation (DCM) in which the teacher and student networks are trained collaboratively from scratch.
To boost KT performance, we introduce dense bidirectional KD operations between the layers with appended classifiers.
We test our method on a variety of KT tasks, showing its superiorities over related methods.
arXiv Detail & Related papers (2020-08-18T09:25:08Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.