Understanding the Effects of Projectors in Knowledge Distillation
- URL: http://arxiv.org/abs/2310.17183v1
- Date: Thu, 26 Oct 2023 06:30:39 GMT
- Title: Understanding the Effects of Projectors in Knowledge Distillation
- Authors: Yudong Chen, Sen Wang, Jiajun Liu, Xuwei Xu, Frank de Hoog, Brano
Kusy, Zi Huang
- Abstract summary: Even if the student and the teacher have the same feature dimensions, adding a projector still helps to improve the distillation performance.
This paper investigates the implicit role that projectors play but so far have been overlooked.
Motivated by the positive effects of projectors, we propose a projector ensemble-based feature distillation method to further improve distillation performance.
- Score: 31.882356225974632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventionally, during the knowledge distillation process (e.g. feature
distillation), an additional projector is often required to perform feature
transformation due to the dimension mismatch between the teacher and the
student networks. Interestingly, we discovered that even if the student and the
teacher have the same feature dimensions, adding a projector still helps to
improve the distillation performance. In addition, projectors even improve
logit distillation if we add them to the architecture too. Inspired by these
surprising findings and the general lack of understanding of the projectors in
the knowledge distillation process from existing literature, this paper
investigates the implicit role that projectors play but so far have been
overlooked. Our empirical study shows that the student with a projector (1)
obtains a better trade-off between the training accuracy and the testing
accuracy compared to the student without a projector when it has the same
feature dimensions as the teacher, (2) better preserves its similarity to the
teacher beyond shallow and numeric resemblance, from the view of Centered
Kernel Alignment (CKA), and (3) avoids being over-confident as the teacher does
at the testing phase. Motivated by the positive effects of projectors, we
propose a projector ensemble-based feature distillation method to further
improve distillation performance. Despite the simplicity of the proposed
strategy, empirical results from the evaluation of classification tasks on
benchmark datasets demonstrate the superior classification performance of our
method on a broad range of teacher-student pairs and verify from the aspects of
CKA and model calibration that the student's features are of improved quality
with the projector ensemble design.
Related papers
- Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z) - Understanding the Role of the Projector in Knowledge Distillation [22.698845243751293]
We revisit the efficacy of knowledge distillation as a function matching and metric learning problem.
We verify three important design decisions, namely the normalisation, soft maximum function, and projection layers.
We attain a 77.2% top-1 accuracy with DeiT-Ti on ImageNet.
arXiv Detail & Related papers (2023-03-20T13:33:31Z) - Improved Feature Distillation via Projector Ensemble [40.86679028635297]
We propose a new feature distillation method based on a projector ensemble for further performance improvement.
We observe that the student network benefits from a projector even if the feature dimensions of the student and the teacher are the same.
We propose an ensemble of projectors to further improve the quality of student features.
arXiv Detail & Related papers (2022-10-27T09:08:40Z) - Cross-Architecture Knowledge Distillation [32.689574589575244]
It is natural to distill complementary knowledge from Transformer to convolutional neural network (CNN)
To deal with this problem, a novel cross-architecture knowledge distillation method is proposed.
The proposed method outperforms 14 state-of-the-arts on both small-scale and large-scale datasets.
arXiv Detail & Related papers (2022-07-12T02:50:48Z) - Knowledge Distillation with the Reused Teacher Classifier [31.22117343316628]
We show that a simple knowledge distillation technique is enough to significantly narrow down the teacher-student performance gap.
Our technique achieves state-of-the-art results at the modest cost of compression ratio due to the added projector.
arXiv Detail & Related papers (2022-03-26T06:28:46Z) - Delta Distillation for Efficient Video Processing [68.81730245303591]
We propose a novel knowledge distillation schema coined as Delta Distillation.
We demonstrate that these temporal variations can be effectively distilled due to the temporal redundancies within video frames.
As a by-product, delta distillation improves the temporal consistency of the teacher model.
arXiv Detail & Related papers (2022-03-17T20:13:30Z) - Distilling Image Classifiers in Object Detectors [81.63849985128527]
We study the case of object detection and, instead of following the standard detector-to-detector distillation approach, introduce a classifier-to-detector knowledge transfer framework.
In particular, we propose strategies to exploit the classification teacher to improve both the detector's recognition accuracy and localization performance.
arXiv Detail & Related papers (2021-06-09T16:50:10Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.