Focal and Global Knowledge Distillation for Detectors
- URL: http://arxiv.org/abs/2111.11837v1
- Date: Tue, 23 Nov 2021 13:04:40 GMT
- Title: Focal and Global Knowledge Distillation for Detectors
- Authors: Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei
Zhao, Chun Yuan
- Abstract summary: We propose Focal and Global Distillation (FGD) for object detection.
FGD separates the foreground and background, forcing the student to focus on the teacher's critical pixels and channels.
As our method only needs to calculate the loss on the feature map, FGD can be applied to various detectors.
- Score: 23.315649744061982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation has been applied to image classification successfully.
However, object detection is much more sophisticated and most knowledge
distillation methods have failed on it. In this paper, we point out that in
object detection, the features of the teacher and student vary greatly in
different areas, especially in the foreground and background. If we distill
them equally, the uneven differences between feature maps will negatively
affect the distillation. Thus, we propose Focal and Global Distillation (FGD).
Focal distillation separates the foreground and background, forcing the student
to focus on the teacher's critical pixels and channels. Global distillation
rebuilds the relation between different pixels and transfers it from teachers
to students, compensating for missing global information in focal distillation.
As our method only needs to calculate the loss on the feature map, FGD can be
applied to various detectors. We experiment on various detectors with different
backbones and the results show that the student detector achieves excellent mAP
improvement. For example, ResNet-50 based RetinaNet, Faster RCNN, RepPoints and
Mask RCNN with our distillation method achieve 40.7%, 42.0%, 42.0% and 42.1%
mAP on COCO2017, which are 3.3, 3.6, 3.4 and 2.9 higher than the baseline,
respectively. Our codes are available at https://github.com/yzd-v/FGD.
Related papers
- What is Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias [1.03590082373586]
As many as 41% of the classes are statistically significantly affected by distillation when comparing class-wise accuracy.
This study highlights the uneven effects of Knowledge Distillation on certain classes and its potentially significant role in fairness.
arXiv Detail & Related papers (2024-10-10T22:43:00Z) - Self-Supervised Keypoint Detection with Distilled Depth Keypoint Representation [0.8136541584281987]
Distill-DKP is a novel cross-modal knowledge distillation framework for keypoint detection in a self-supervised setting.
During training, Distill-DKP extracts embedding-level knowledge from a depth-based teacher model to guide an image-based student model.
Experiments show that Distill-DKP significantly outperforms previous unsupervised methods.
arXiv Detail & Related papers (2024-10-04T22:14:08Z) - Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z) - Dual Relation Knowledge Distillation for Object Detection [7.027174952925931]
The pixel-wise relation distillation embeds pixel-wise features in the graph space and applies graph convolution to capture the global pixel relation.
The instance-wise relation distillation is designed, which calculates the similarity of different instances to obtain a relation matrix.
Our method achieves state-of-the-art performance, which improves Faster R-CNN based on ResNet50 from 38.4% to 41.6% mAP.
arXiv Detail & Related papers (2023-02-11T09:38:53Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - PKD: General Distillation Framework for Object Detectors via Pearson
Correlation Coefficient [18.782520279344553]
This paper empirically find that better FPN features from a heterogeneous teacher detector can help the student.
We propose to imitate features with Pearson Correlation Coefficient to focus on the relational information from the teacher.
Our method consistently outperforms the existing detection KD methods and works for both homogeneous and heterogeneous student-teacher pairs.
arXiv Detail & Related papers (2022-07-05T13:37:34Z) - G-DetKD: Towards General Distillation Framework for Object Detectors via
Contrastive and Semantic-guided Feature Imitation [49.421099172544196]
We propose a novel semantic-guided feature imitation technique, which automatically performs soft matching between feature pairs across all pyramid levels.
We also introduce contrastive distillation to effectively capture the information encoded in the relationship between different feature regions.
Our method consistently outperforms the existing detection KD techniques, and works when (1) components in the framework are used separately and in conjunction.
arXiv Detail & Related papers (2021-08-17T07:44:27Z) - Distilling Object Detectors via Decoupled Features [69.62967325617632]
We present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector.
Experiments on various detectors with different backbones show that the proposed DeFeat is able to surpass the state-of-the-art distillation methods for object detection.
arXiv Detail & Related papers (2021-03-26T13:58:49Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z) - Why distillation helps: a statistical perspective [69.90148901064747]
Knowledge distillation is a technique for improving the performance of a simple "student" model.
While this simple approach has proven widely effective, a basic question remains unresolved: why does distillation help?
We show how distillation complements existing negative mining techniques for extreme multiclass retrieval.
arXiv Detail & Related papers (2020-05-21T01:49:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.