Gradient-Guided Knowledge Distillation for Object Detectors
- URL: http://arxiv.org/abs/2303.04240v1
- Date: Tue, 7 Mar 2023 21:09:09 GMT
- Title: Gradient-Guided Knowledge Distillation for Object Detectors
- Authors: Qizhen Lan and Qing Tian
- Abstract summary: We propose a novel approach for knowledge distillation in object detection, named Gradient-guided Knowledge Distillation (GKD)
Our GKD uses gradient information to identify and assign more weights to features that significantly impact the detection loss, allowing the student to learn the most relevant features from the teacher.
Experiments on the KITTI and COCO-Traffic datasets demonstrate our method's efficacy in knowledge distillation for object detection.
- Score: 3.236217153362305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models have demonstrated remarkable success in object
detection, yet their complexity and computational intensity pose a barrier to
deploying them in real-world applications (e.g., self-driving perception).
Knowledge Distillation (KD) is an effective way to derive efficient models.
However, only a small number of KD methods tackle object detection. Also, most
of them focus on mimicking the plain features of the teacher model but rarely
consider how the features contribute to the final detection. In this paper, we
propose a novel approach for knowledge distillation in object detection, named
Gradient-guided Knowledge Distillation (GKD). Our GKD uses gradient information
to identify and assign more weights to features that significantly impact the
detection loss, allowing the student to learn the most relevant features from
the teacher. Furthermore, we present bounding-box-aware multi-grained feature
imitation (BMFI) to further improve the KD performance. Experiments on the
KITTI and COCO-Traffic datasets demonstrate our method's efficacy in knowledge
distillation for object detection. On one-stage and two-stage detectors, our
GKD-BMFI leads to an average of 5.1% and 3.8% mAP improvement, respectively,
beating various state-of-the-art KD methods.
Related papers
- Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection [13.255646312416532]
We propose a novel knowledge distillation framework for UAV-OD.
Specifically, a progressive distillation approach is designed to alleviate the feature gap between teacher and student models.
A new feature alignment method is provided to extract object-related features for enhancing student model's knowledge reception efficiency.
arXiv Detail & Related papers (2024-08-21T08:05:03Z) - Relative Difficulty Distillation for Semantic Segmentation [54.76143187709987]
We propose a pixel-level KD paradigm for semantic segmentation named Relative Difficulty Distillation (RDD)
RDD allows the teacher network to provide effective guidance on learning focus without additional optimization goals.
Our research showcases that RDD can integrate with existing KD methods to improve their upper performance bound.
arXiv Detail & Related papers (2024-07-04T08:08:25Z) - Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection [47.0507287491627]
We propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection.
By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model.
Our method performs effectively during the KD process without requiring intricate structures or extensive computational resources.
arXiv Detail & Related papers (2024-06-11T06:51:02Z) - Efficient Object Detection in Optical Remote Sensing Imagery via
Attention-based Feature Distillation [29.821082433621868]
We propose Attention-based Feature Distillation (AFD) for object detection.
We introduce a multi-instance attention mechanism that effectively distinguishes between background and foreground elements.
AFD attains the performance of other state-of-the-art models while being efficient.
arXiv Detail & Related papers (2023-10-28T11:15:37Z) - CrossKD: Cross-Head Knowledge Distillation for Object Detection [69.16346256926842]
Knowledge Distillation (KD) has been validated as an effective model compression technique for learning compact object detectors.
We present a prediction mimicking distillation scheme, called CrossKD, which delivers the intermediate features of the student's detection head to the teacher's detection head.
Our CrossKD boosts the average precision of GFL ResNet-50 with 1x training schedule from 40.2 to 43.7, outperforming all existing KD methods.
arXiv Detail & Related papers (2023-06-20T08:19:51Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Knowledge Distillation for Object Detection via Rank Mimicking and
Prediction-guided Feature Imitation [34.441349114336994]
We propose Rank Mimicking (RM) and Prediction-guided Feature Imitation (PFI) for distilling one-stage detectors.
RM takes the rank of candidate boxes from teachers as a new form of knowledge to distill.
PFI attempts to correlate feature differences with prediction differences, making feature imitation directly help to improve the student's accuracy.
arXiv Detail & Related papers (2021-12-09T11:19:15Z) - Distilling Image Classifiers in Object Detectors [81.63849985128527]
We study the case of object detection and, instead of following the standard detector-to-detector distillation approach, introduce a classifier-to-detector knowledge transfer framework.
In particular, we propose strategies to exploit the classification teacher to improve both the detector's recognition accuracy and localization performance.
arXiv Detail & Related papers (2021-06-09T16:50:10Z) - Heterogeneous Knowledge Distillation using Information Flow Modeling [82.83891707250926]
We propose a novel KD method that works by modeling the information flow through the various layers of the teacher model.
The proposed method is capable of overcoming the aforementioned limitations by using an appropriate supervision scheme during the different phases of the training process.
arXiv Detail & Related papers (2020-05-02T06:56:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.