PKD: General Distillation Framework for Object Detectors via Pearson
Correlation Coefficient
- URL: http://arxiv.org/abs/2207.02039v1
- Date: Tue, 5 Jul 2022 13:37:34 GMT
- Title: PKD: General Distillation Framework for Object Detectors via Pearson
Correlation Coefficient
- Authors: Weihan Cao, Yifan Zhang, Jianfei Gao, Anda Cheng, Ke Cheng, Jian Cheng
- Abstract summary: This paper empirically find that better FPN features from a heterogeneous teacher detector can help the student.
We propose to imitate features with Pearson Correlation Coefficient to focus on the relational information from the teacher.
Our method consistently outperforms the existing detection KD methods and works for both homogeneous and heterogeneous student-teacher pairs.
- Score: 18.782520279344553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation(KD) is a widely-used technique to train compact models
in object detection. However, there is still a lack of study on how to distill
between heterogeneous detectors. In this paper, we empirically find that better
FPN features from a heterogeneous teacher detector can help the student
although their detection heads and label assignments are different. However,
directly aligning the feature maps to distill detectors suffers from two
problems. First, the difference in feature magnitude between the teacher and
the student could enforce overly strict constraints on the student. Second, the
FPN stages and channels with large feature magnitude from the teacher model
could dominate the gradient of distillation loss, which will overwhelm the
effects of other features in KD and introduce much noise. To address the above
issues, we propose to imitate features with Pearson Correlation Coefficient to
focus on the relational information from the teacher and relax constraints on
the magnitude of the features. Our method consistently outperforms the existing
detection KD methods and works for both homogeneous and heterogeneous
student-teacher pairs. Furthermore, it converges faster. With a powerful
MaskRCNN-Swin detector as the teacher, ResNet-50 based RetinaNet and FCOS
achieve 41.5% and 43.9% mAP on COCO2017, which are 4.1\% and 4.8\% higher than
the baseline, respectively.
Related papers
- Improving Knowledge Distillation via Regularizing Feature Norm and
Direction [16.98806338782858]
Knowledge distillation (KD) exploits a large well-trained model (i.e., teacher) to train a small student model on the same dataset for the same task.
Treating teacher features as knowledge, prevailing methods of knowledge distillation train student by aligning its features with the teacher's, e.g., by minimizing the KL-divergence between their logits or L2 distance between their intermediate features.
While it is natural to believe that better alignment of student features to the teacher better distills teacher knowledge, simply forcing this alignment does not directly contribute to the student's performance, e.g.
arXiv Detail & Related papers (2023-05-26T15:05:19Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - Function-Consistent Feature Distillation [99.0460424124249]
Feature distillation makes the student mimic the intermediate features of the teacher.
We propose Function-Consistent Feature Distillation (FCFD), which explicitly optimize the functional similarity between teacher and student features.
arXiv Detail & Related papers (2023-04-24T05:43:29Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors [34.90279031067575]
We investigate KD among heterogeneous teacher-student pairs for a wide application.
We propose the HEtero-Assists Distillation (HEAD) framework, leveraging heterogeneous detection heads as assistants.
Our method has achieved significant improvement compared to current detection KD methods.
arXiv Detail & Related papers (2022-07-12T07:01:34Z) - Knowledge Distillation for Object Detection via Rank Mimicking and
Prediction-guided Feature Imitation [34.441349114336994]
We propose Rank Mimicking (RM) and Prediction-guided Feature Imitation (PFI) for distilling one-stage detectors.
RM takes the rank of candidate boxes from teachers as a new form of knowledge to distill.
PFI attempts to correlate feature differences with prediction differences, making feature imitation directly help to improve the student's accuracy.
arXiv Detail & Related papers (2021-12-09T11:19:15Z) - Focal and Global Knowledge Distillation for Detectors [23.315649744061982]
We propose Focal and Global Distillation (FGD) for object detection.
FGD separates the foreground and background, forcing the student to focus on the teacher's critical pixels and channels.
As our method only needs to calculate the loss on the feature map, FGD can be applied to various detectors.
arXiv Detail & Related papers (2021-11-23T13:04:40Z) - G-DetKD: Towards General Distillation Framework for Object Detectors via
Contrastive and Semantic-guided Feature Imitation [49.421099172544196]
We propose a novel semantic-guided feature imitation technique, which automatically performs soft matching between feature pairs across all pyramid levels.
We also introduce contrastive distillation to effectively capture the information encoded in the relationship between different feature regions.
Our method consistently outperforms the existing detection KD techniques, and works when (1) components in the framework are used separately and in conjunction.
arXiv Detail & Related papers (2021-08-17T07:44:27Z) - Distilling Object Detectors via Decoupled Features [69.62967325617632]
We present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector.
Experiments on various detectors with different backbones show that the proposed DeFeat is able to surpass the state-of-the-art distillation methods for object detection.
arXiv Detail & Related papers (2021-03-26T13:58:49Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.