Knowledge Distillation By Sparse Representation Matching
- URL: http://arxiv.org/abs/2103.17012v1
- Date: Wed, 31 Mar 2021 11:47:47 GMT
- Title: Knowledge Distillation By Sparse Representation Matching
- Authors: Dat Thanh Tran, Moncef Gabbouj, Alexandros Iosifidis
- Abstract summary: We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
- Score: 107.87219371697063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge Distillation refers to a class of methods that transfers the
knowledge from a teacher network to a student network. In this paper, we
propose Sparse Representation Matching (SRM), a method to transfer intermediate
knowledge obtained from one Convolutional Neural Network (CNN) to another by
utilizing sparse representation learning. SRM first extracts sparse
representations of the hidden features of the teacher CNN, which are then used
to generate both pixel-level and image-level labels for training intermediate
feature maps of the student network. We formulate SRM as a neural processing
block, which can be efficiently optimized using stochastic gradient descent and
integrated into any CNN in a plug-and-play manner. Our experiments demonstrate
that SRM is robust to architectural differences between the teacher and student
networks, and outperforms other KD techniques across several datasets.
Related papers
- Linking in Style: Understanding learned features in deep learning models [0.0]
Convolutional neural networks (CNNs) learn abstract features to perform object classification.
We propose an automatic method to visualize and systematically analyze learned features in CNNs.
arXiv Detail & Related papers (2024-09-25T12:28:48Z) - Adaptive Convolutional Dictionary Network for CT Metal Artifact
Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction.
Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image.
Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z) - Graph Consistency based Mean-Teaching for Unsupervised Domain Adaptive
Person Re-Identification [54.58165777717885]
This paper proposes a Graph Consistency based Mean-Teaching (GCMT) method with constructing the Graph Consistency Constraint (GCC) between teacher and student networks.
Experiments on three datasets, i.e., Market-1501, DukeMTMCreID, and MSMT17, show that proposed GCMT outperforms state-of-the-art methods by clear margin.
arXiv Detail & Related papers (2021-05-11T04:09:49Z) - Video-based Facial Expression Recognition using Graph Convolutional
Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z) - Learning with Privileged Information for Efficient Image
Super-Resolution [35.599731963795875]
We introduce in this paper a novel distillation framework, consisting of teacher and student networks, that allows to boost the performance of FSRCNN drastically.
The encoder in the teacher learns the degradation process, subsampling of HR images, using an imitation loss.
The student and the decoder in the teacher, having the same network architecture as FSRCNN, try to reconstruct HR images.
arXiv Detail & Related papers (2020-07-15T07:44:18Z) - Distilling Knowledge from Graph Convolutional Networks [146.71503336770886]
Existing knowledge distillation methods focus on convolutional neural networks (CNNs)
We propose the first dedicated approach to distilling knowledge from a pre-trained graph convolutional network (GCN) model.
We show that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
arXiv Detail & Related papers (2020-03-23T18:23:11Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.