Focal and Global Knowledge Distillation for Detectors
        - URL: http://arxiv.org/abs/2111.11837v1
- Date: Tue, 23 Nov 2021 13:04:40 GMT
- Title: Focal and Global Knowledge Distillation for Detectors
- Authors: Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei
  Zhao, Chun Yuan
- Abstract summary: We propose Focal and Global Distillation (FGD) for object detection.
FGD separates the foreground and background, forcing the student to focus on the teacher's critical pixels and channels.
As our method only needs to calculate the loss on the feature map, FGD can be applied to various detectors.
- Score: 23.315649744061982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Knowledge distillation has been applied to image classification successfully.
However, object detection is much more sophisticated and most knowledge
distillation methods have failed on it. In this paper, we point out that in
object detection, the features of the teacher and student vary greatly in
different areas, especially in the foreground and background. If we distill
them equally, the uneven differences between feature maps will negatively
affect the distillation. Thus, we propose Focal and Global Distillation (FGD).
Focal distillation separates the foreground and background, forcing the student
to focus on the teacher's critical pixels and channels. Global distillation
rebuilds the relation between different pixels and transfers it from teachers
to students, compensating for missing global information in focal distillation.
As our method only needs to calculate the loss on the feature map, FGD can be
applied to various detectors. We experiment on various detectors with different
backbones and the results show that the student detector achieves excellent mAP
improvement. For example, ResNet-50 based RetinaNet, Faster RCNN, RepPoints and
Mask RCNN with our distillation method achieve 40.7%, 42.0%, 42.0% and 42.1%
mAP on COCO2017, which are 3.3, 3.6, 3.4 and 2.9 higher than the baseline,
respectively. Our codes are available at https://github.com/yzd-v/FGD.
 
      
        Related papers
        - CLoCKDistill: Consistent Location-and-Context-aware Knowledge   Distillation for DETRs [2.7624021966289605]
 This paper proposes Consistent Location-and-Context-aware Knowledge Distillation (CLoCKDistill) for DETR detectors.
We distill the transformer encoder output (i.e., memory) that contains valuable global context and long-range dependencies.
Our method boosts student detector performance by 2.2% to 6.4%.
 arXiv  Detail & Related papers  (2025-02-15T06:02:51Z)
- What is Left After Distillation? How Knowledge Transfer Impacts Fairness   and Bias [1.03590082373586]
 As many as 41% of the classes are statistically significantly affected by distillation when comparing class-wise accuracy.
This study highlights the uneven effects of Knowledge Distillation on certain classes and its potentially significant role in fairness.
 arXiv  Detail & Related papers  (2024-10-10T22:43:00Z)
- Self-Supervised Keypoint Detection with Distilled Depth Keypoint   Representation [0.8136541584281987]
 Distill-DKP is a novel cross-modal knowledge distillation framework for keypoint detection in a self-supervised setting.
During training, Distill-DKP extracts embedding-level knowledge from a depth-based teacher model to guide an image-based student model.
 Experiments show that Distill-DKP significantly outperforms previous unsupervised methods.
 arXiv  Detail & Related papers  (2024-10-04T22:14:08Z)
- Knowledge Distillation with Refined Logits [31.205248790623703]
 We introduce Refined Logit Distillation (RLD) to address the limitations of current logit distillation methods.<n>Our approach is motivated by the observation that even high-performing teacher models can make incorrect predictions.<n>Our method can effectively eliminate misleading information from the teacher while preserving crucial class correlations.
 arXiv  Detail & Related papers  (2024-08-14T17:59:32Z)
- Learning Lightweight Object Detectors via Multi-Teacher Progressive
  Distillation [56.053397775016755]
 We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
 arXiv  Detail & Related papers  (2023-08-17T17:17:08Z)
- Dual Relation Knowledge Distillation for Object Detection [7.027174952925931]
 The pixel-wise relation distillation embeds pixel-wise features in the graph space and applies graph convolution to capture the global pixel relation.
The instance-wise relation distillation is designed, which calculates the similarity of different instances to obtain a relation matrix.
Our method achieves state-of-the-art performance, which improves Faster R-CNN based on ResNet50 from 38.4% to 41.6% mAP.
 arXiv  Detail & Related papers  (2023-02-11T09:38:53Z)
- KD-DETR: Knowledge Distillation for Detection Transformer with   Consistent Distillation Points Sampling [52.11242317111469]
 We focus on the compression of DETR with knowledge distillation.<n>The main challenge in DETR distillation is the lack of consistent distillation points.<n>We propose the first general knowledge distillation paradigm for DETR with consistent distillation points sampling.
 arXiv  Detail & Related papers  (2022-11-15T11:52:30Z)
- Exploring Inconsistent Knowledge Distillation for Object Detection with
  Data Augmentation [66.25738680429463]
 Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
 arXiv  Detail & Related papers  (2022-09-20T16:36:28Z)
- PKD: General Distillation Framework for Object Detectors via Pearson
  Correlation Coefficient [18.782520279344553]
 This paper empirically find that better FPN features from a heterogeneous teacher detector can help the student.
We propose to imitate features with Pearson Correlation Coefficient to focus on the relational information from the teacher.
Our method consistently outperforms the existing detection KD methods and works for both homogeneous and heterogeneous student-teacher pairs.
 arXiv  Detail & Related papers  (2022-07-05T13:37:34Z)
- G-DetKD: Towards General Distillation Framework for Object Detectors via
  Contrastive and Semantic-guided Feature Imitation [49.421099172544196]
 We propose a novel semantic-guided feature imitation technique, which automatically performs soft matching between feature pairs across all pyramid levels.
We also introduce contrastive distillation to effectively capture the information encoded in the relationship between different feature regions.
Our method consistently outperforms the existing detection KD techniques, and works when (1) components in the framework are used separately and in conjunction.
 arXiv  Detail & Related papers  (2021-08-17T07:44:27Z)
- Distilling Object Detectors via Decoupled Features [69.62967325617632]
 We present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector.
Experiments on various detectors with different backbones show that the proposed DeFeat is able to surpass the state-of-the-art distillation methods for object detection.
 arXiv  Detail & Related papers  (2021-03-26T13:58:49Z)
- Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
 Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
 arXiv  Detail & Related papers  (2020-06-23T15:58:22Z)
- Why distillation helps: a statistical perspective [69.90148901064747]
 Knowledge distillation is a technique for improving the performance of a simple "student" model.
While this simple approach has proven widely effective, a basic question remains unresolved: why does distillation help?
We show how distillation complements existing negative mining techniques for extreme multiclass retrieval.
 arXiv  Detail & Related papers  (2020-05-21T01:49:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.