Related papers: Focal and Global Knowledge Distillation for Detectors

Focal and Global Knowledge Distillation for Detectors

URL: http://arxiv.org/abs/2111.11837v1
Date: Tue, 23 Nov 2021 13:04:40 GMT
Title: Focal and Global Knowledge Distillation for Detectors
Authors: Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei Zhao, Chun Yuan
Abstract summary: We propose Focal and Global Distillation (FGD) for object detection. FGD separates the foreground and background, forcing the student to focus on the teacher's critical pixels and channels. As our method only needs to calculate the loss on the feature map, FGD can be applied to various detectors.
Score: 23.315649744061982
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge distillation has been applied to image classification successfully. However, object detection is much more sophisticated and most knowledge distillation methods have failed on it. In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the foreground and background. If we distill them equally, the uneven differences between feature maps will negatively affect the distillation. Thus, we propose Focal and Global Distillation (FGD). Focal distillation separates the foreground and background, forcing the student to focus on the teacher's critical pixels and channels. Global distillation rebuilds the relation between different pixels and transfers it from teachers to students, compensating for missing global information in focal distillation. As our method only needs to calculate the loss on the feature map, FGD can be applied to various detectors. We experiment on various detectors with different backbones and the results show that the student detector achieves excellent mAP improvement. For example, ResNet-50 based RetinaNet, Faster RCNN, RepPoints and Mask RCNN with our distillation method achieve 40.7%, 42.0%, 42.0% and 42.1% mAP on COCO2017, which are 3.3, 3.6, 3.4 and 2.9 higher than the baseline, respectively. Our codes are available at https://github.com/yzd-v/FGD.

Related papers

CLoCKDistill: Consistent Location-and-Context-aware Knowledge Distillation for DETRs [2.7624021966289605]
This paper proposes Consistent Location-and-Context-aware Knowledge Distillation (CLoCKDistill) for DETR detectors. We distill the transformer encoder output (i.e., memory) that contains valuable global context and long-range dependencies. Our method boosts student detector performance by 2.2% to 6.4%.
arXiv Detail & Related papers (2025-02-15T06:02:51Z)
What is Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias [1.03590082373586]
As many as 41% of the classes are statistically significantly affected by distillation when comparing class-wise accuracy. This study highlights the uneven effects of Knowledge Distillation on certain classes and its potentially significant role in fairness.
arXiv Detail & Related papers (2024-10-10T22:43:00Z)
Self-Supervised Keypoint Detection with Distilled Depth Keypoint Representation [0.8136541584281987]
Distill-DKP is a novel cross-modal knowledge distillation framework for keypoint detection in a self-supervised setting. During training, Distill-DKP extracts embedding-level knowledge from a depth-based teacher model to guide an image-based student model. Experiments show that Distill-DKP significantly outperforms previous unsupervised methods.
arXiv Detail & Related papers (2024-10-04T22:14:08Z)
Knowledge Distillation with Refined Logits [31.205248790623703]
We introduce Refined Logit Distillation (RLD) to address the limitations of current logit distillation methods.<n>Our approach is motivated by the observation that even high-performing teacher models can make incorrect predictions.<n>Our method can effectively eliminate misleading information from the teacher while preserving crucial class correlations.
arXiv Detail & Related papers (2024-08-14T17:59:32Z)
Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student. To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z)
Dual Relation Knowledge Distillation for Object Detection [7.027174952925931]
The pixel-wise relation distillation embeds pixel-wise features in the graph space and applies graph convolution to capture the global pixel relation. The instance-wise relation distillation is designed, which calculates the similarity of different instances to obtain a relation matrix. Our method achieves state-of-the-art performance, which improves Faster R-CNN based on ResNet50 from 38.4% to 41.6% mAP.
arXiv Detail & Related papers (2023-02-11T09:38:53Z)
KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling [52.11242317111469]
We focus on the compression of DETR with knowledge distillation.<n>The main challenge in DETR distillation is the lack of consistent distillation points.<n>We propose the first general knowledge distillation paradigm for DETR with consistent distillation points sampling.
arXiv Detail & Related papers (2022-11-15T11:52:30Z)
Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions. Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z)
PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient [18.782520279344553]
This paper empirically find that better FPN features from a heterogeneous teacher detector can help the student. We propose to imitate features with Pearson Correlation Coefficient to focus on the relational information from the teacher. Our method consistently outperforms the existing detection KD methods and works for both homogeneous and heterogeneous student-teacher pairs.
arXiv Detail & Related papers (2022-07-05T13:37:34Z)
G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation [49.421099172544196]
We propose a novel semantic-guided feature imitation technique, which automatically performs soft matching between feature pairs across all pyramid levels. We also introduce contrastive distillation to effectively capture the information encoded in the relationship between different feature regions. Our method consistently outperforms the existing detection KD techniques, and works when (1) components in the framework are used separately and in conjunction.
arXiv Detail & Related papers (2021-08-17T07:44:27Z)
Distilling Object Detectors via Decoupled Features [69.62967325617632]
We present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector. Experiments on various detectors with different backbones show that the proposed DeFeat is able to surpass the state-of-the-art distillation methods for object detection.
arXiv Detail & Related papers (2021-03-26T13:58:49Z)
Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices. Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
Why distillation helps: a statistical perspective [69.90148901064747]
Knowledge distillation is a technique for improving the performance of a simple "student" model. While this simple approach has proven widely effective, a basic question remains unresolved: why does distillation help? We show how distillation complements existing negative mining techniques for extreme multiclass retrieval.
arXiv Detail & Related papers (2020-05-21T01:49:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.