A Novel Self-Knowledge Distillation Approach with Siamese Representation
Learning for Action Recognition
- URL: http://arxiv.org/abs/2209.01311v1
- Date: Sat, 3 Sep 2022 01:56:58 GMT
- Title: A Novel Self-Knowledge Distillation Approach with Siamese Representation
Learning for Action Recognition
- Authors: Duc-Quang Vu, Trang Phung, Jia-Ching Wang
- Abstract summary: Self-knowledge distillation is an effective transfer of knowledge from a heavy network (teacher) to a small network (student) to boost students' performance.
This paper introduces a novel Self-knowledge distillation approach via Siamese representation learning.
- Score: 6.554259611868312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation is an effective transfer of knowledge from a heavy
network (teacher) to a small network (student) to boost students' performance.
Self-knowledge distillation, the special case of knowledge distillation, has
been proposed to remove the large teacher network training process while
preserving the student's performance. This paper introduces a novel
Self-knowledge distillation approach via Siamese representation learning, which
minimizes the difference between two representation vectors of the two
different views from a given sample. Our proposed method, SKD-SRL, utilizes
both soft label distillation and the similarity of representation vectors.
Therefore, SKD-SRL can generate more consistent predictions and representations
in various views of the same data point. Our benchmark has been evaluated on
various standard datasets. The experimental results have shown that SKD-SRL
significantly improves the accuracy compared to existing supervised learning
and knowledge distillation methods regardless of the networks.
Related papers
- LAKD-Activation Mapping Distillation Based on Local Learning [12.230042188890838]
This paper proposes a novel knowledge distillation framework, Local Attention Knowledge Distillation (LAKD)
LAKD more efficiently utilizes the distilled information from teacher networks, achieving higher interpretability and competitive performance.
We conducted experiments on the CIFAR-10, CIFAR-100, and ImageNet datasets, and the results show that our LAKD method significantly outperforms existing methods.
arXiv Detail & Related papers (2024-08-21T09:43:27Z) - Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$)
We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network.
We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student.
At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL)
Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z) - Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge
Distillation [12.097302014936655]
This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD)
Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation.
We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets.
arXiv Detail & Related papers (2021-03-15T10:59:43Z) - Self Regulated Learning Mechanism for Data Efficient Knowledge
Distillation [8.09591217280048]
A novel data-efficient approach to transfer the knowledge from a teacher model to a student model is presented.
The teacher model uses self-regulation to select appropriate samples for training and identifies their significance in the process.
During distillation, the significance information can be used along with the soft-targets to supervise the students.
arXiv Detail & Related papers (2021-02-14T10:43:13Z) - Collaborative Teacher-Student Learning via Multiple Knowledge Transfer [79.45526596053728]
We propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT)
It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way.
The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
arXiv Detail & Related papers (2021-01-21T07:17:04Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z) - Residual Knowledge Distillation [96.18815134719975]
This work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A)
In this way, S is trained to mimic the feature maps of T, and A aids this process by learning the residual error between them.
Experiments show that our approach achieves appealing results on popular classification datasets, CIFAR-100 and ImageNet.
arXiv Detail & Related papers (2020-02-21T07:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.