Related papers: Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation

Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation

URL: http://arxiv.org/abs/2103.08273v1
Date: Mon, 15 Mar 2021 10:59:43 GMT
Title: Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation
Authors: Mingi Ji, Seungjae Shin, Seunghyun Hwang, Gibeom Park, Il-Chul Moon
Abstract summary: This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD) Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation. We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets.
Score: 12.097302014936655
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Knowledge distillation is a method of transferring the knowledge from a pretrained complex teacher model to a student model, so a smaller network can replace a large teacher network at the deployment stage. To reduce the necessity of training a large teacher model, the recent literatures introduced a self-knowledge distillation, which trains a student network progressively to distill its own knowledge without a pretrained teacher network. While Self-knowledge distillation is largely divided into a data augmentation based approach and an auxiliary network based approach, the data augmentation approach looses its local information in the augmentation process, which hinders its applicability to diverse vision tasks, such as semantic segmentation. Moreover, these knowledge distillation approaches do not receive the refined feature maps, which are prevalent in the object detection and semantic segmentation community. This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD), which utilizes an auxiliary self-teacher network to transfer a refined knowledge for the classifier network. Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation. Therefore, FRSKD can be applied to classification, and semantic segmentation, which emphasize preserving the local information. We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets. The implemented code is available at https://github.com/MingiJi/FRSKD.

Related papers

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
SAMKD: Spatial-aware Adaptive Masking Knowledge Distillation for Object Detection [4.33169417430713]
We propose a spatial-aware Adaptive Masking Knowledge Distillation framework for accurate object detection. Our method improves the student network from 35.3% to 38.8% mAP, outperforming state-of-the-art distillation methods.
arXiv Detail & Related papers (2025-01-13T07:26:37Z)
LAKD-Activation Mapping Distillation Based on Local Learning [12.230042188890838]
This paper proposes a novel knowledge distillation framework, Local Attention Knowledge Distillation (LAKD) LAKD more efficiently utilizes the distilled information from teacher networks, achieving higher interpretability and competitive performance. We conducted experiments on the CIFAR-10, CIFAR-100, and ImageNet datasets, and the results show that our LAKD method significantly outperforms existing methods.
arXiv Detail & Related papers (2024-08-21T09:43:27Z)
Attention-guided Feature Distillation for Semantic Segmentation [8.344263189293578]
This paper showcases the efficacy of a simple yet powerful method for utilizing refined feature maps to transfer attention. The proposed Attention-guided Feature Distillation (AttnFD) method, employs the Convolutional Block Attention Module (CBAM) It achieves state-of-the-art results in terms of improving the mean Intersection over Union (mIoU) of the student network on the PascalVoc 2012, Cityscapes, COCO, and CamVid datasets.
arXiv Detail & Related papers (2024-03-08T16:57:47Z)
AICSD: Adaptive Inter-Class Similarity Distillation for Semantic Segmentation [12.92102548320001]
This paper proposes a novel method called Inter-Class Similarity Distillation (ICSD) for the purpose of knowledge distillation. The proposed method transfers high-order relations from the teacher network to the student network by independently computing intra-class distributions for each class from network outputs. Experiments conducted on two well-known datasets for semantic segmentation, Cityscapes and Pascal VOC 2012, validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-08-08T13:17:20Z)
Distribution Shift Matters for Knowledge Distillation with Webly Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$) We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network. We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z)
A Novel Self-Knowledge Distillation Approach with Siamese Representation Learning for Action Recognition [6.554259611868312]
Self-knowledge distillation is an effective transfer of knowledge from a heavy network (teacher) to a small network (student) to boost students' performance. This paper introduces a novel Self-knowledge distillation approach via Siamese representation learning.
arXiv Detail & Related papers (2022-09-03T01:56:58Z)
Normalized Feature Distillation for Semantic Segmentation [6.882655287146012]
We propose a simple yet effective feature distillation method called normalized feature distillation (NFD) Our method achieves state-of-the-art distillation results for semantic segmentation on Cityscapes, VOC 2012, and ADE20K datasets.
arXiv Detail & Related papers (2022-07-12T01:54:25Z)
Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student. At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL) Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z)
Hierarchical Self-supervised Augmented Knowledge Distillation [1.9355744690301404]
We propose an alternative self-supervised augmented task to guide the network to learn the joint distribution of the original recognition task and self-supervised auxiliary task. It is demonstrated as a richer knowledge to improve the representation power without losing the normal classification capability. Our method significantly surpasses the previous SOTA SSKD with an average improvement of 2.56% on CIFAR-100 and an improvement of 0.77% on ImageNet.
arXiv Detail & Related papers (2021-07-29T02:57:21Z)
Wasserstein Contrastive Representation Distillation [114.24609306495456]
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for knowledge distillation. The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks. Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.
arXiv Detail & Related papers (2020-12-15T23:43:28Z)
Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network. We show that the seemingly different self-supervision task can serve as a simple yet powerful solution. By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)
Distilling Knowledge from Graph Convolutional Networks [146.71503336770886]
Existing knowledge distillation methods focus on convolutional neural networks (CNNs) We propose the first dedicated approach to distilling knowledge from a pre-trained graph convolutional network (GCN) model. We show that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
arXiv Detail & Related papers (2020-03-23T18:23:11Z)
Residual Knowledge Distillation [96.18815134719975]
This work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A) In this way, S is trained to mimic the feature maps of T, and A aids this process by learning the residual error between them. Experiments show that our approach achieves appealing results on popular classification datasets, CIFAR-100 and ImageNet.
arXiv Detail & Related papers (2020-02-21T07:49:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.