Towards Efficient 3D Object Detection with Knowledge Distillation
- URL: http://arxiv.org/abs/2205.15156v1
- Date: Mon, 30 May 2022 15:02:16 GMT
- Title: Towards Efficient 3D Object Detection with Knowledge Distillation
- Authors: Jihan Yang, Shaoshuai Shi, Runyu Ding, Zhe Wang, Xiaojuan Qi
- Abstract summary: We explore the potential of knowledge distillation for developing efficient 3D object detectors.
Our best performing model achieves $65.75%$ 2 mAPH, surpassing its teacher model and requiring only $44%$ of teacher flops.
Our most efficient model runs 51 FPS on an NVIDIA A100, which is $2.2times$ faster than PointPillar with even higher accuracy.
- Score: 38.89710768280703
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite substantial progress in 3D object detection, advanced 3D detectors
often suffer from heavy computation overheads. To this end, we explore the
potential of knowledge distillation (KD) for developing efficient 3D object
detectors, focusing on popular pillar- and voxel-based detectors.Without
well-developed teacher-student pairs, we first study how to obtain student
models with good trade offs between accuracy and efficiency from the
perspectives of model compression and input resolution reduction. Then, we
build a benchmark to assess existing KD methods developed in the 2D domain for
3D object detection upon six well-constructed teacher-student pairs. Further,
we propose an improved KD pipeline incorporating an enhanced logit KD method
that performs KD on only a few pivotal positions determined by teacher
classification response, and a teacher-guided student model initialization to
facilitate transferring teacher model's feature extraction ability to students
through weight inheritance. Finally, we conduct extensive experiments on the
Waymo dataset. Our best performing model achieves $65.75\%$ LEVEL 2 mAPH,
surpassing its teacher model and requiring only $44\%$ of teacher flops. Our
most efficient model runs 51 FPS on an NVIDIA A100, which is $2.2\times$ faster
than PointPillar with even higher accuracy. Code will be available.
Related papers
- Representation Disparity-aware Distillation for 3D Object Detection [44.17712259352281]
This paper presents a novel representation disparity-aware distillation (RDD) method to address the representation disparity issue.
Our RDD increases mAP of CP-Voxel-S to 57.1% on nuScenes dataset, which even surpasses teacher performance while taking up only 42% FLOPs.
arXiv Detail & Related papers (2023-08-20T16:06:42Z) - CrossKD: Cross-Head Knowledge Distillation for Object Detection [69.16346256926842]
Knowledge Distillation (KD) has been validated as an effective model compression technique for learning compact object detectors.
We present a prediction mimicking distillation scheme, called CrossKD, which delivers the intermediate features of the student's detection head to the teacher's detection head.
Our CrossKD boosts the average precision of GFL ResNet-50 with 1x training schedule from 40.2 to 43.7, outperforming all existing KD methods.
arXiv Detail & Related papers (2023-06-20T08:19:51Z) - Gradient-Guided Knowledge Distillation for Object Detectors [3.236217153362305]
We propose a novel approach for knowledge distillation in object detection, named Gradient-guided Knowledge Distillation (GKD)
Our GKD uses gradient information to identify and assign more weights to features that significantly impact the detection loss, allowing the student to learn the most relevant features from the teacher.
Experiments on the KITTI and COCO-Traffic datasets demonstrate our method's efficacy in knowledge distillation for object detection.
arXiv Detail & Related papers (2023-03-07T21:09:09Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge
Distillation [70.92135839545314]
We propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation.
Our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
arXiv Detail & Related papers (2022-06-13T11:52:13Z) - Delving into the Pre-training Paradigm of Monocular 3D Object Detection [10.07932482761621]
We study the pre-training paradigm for monocular 3D object detection (M3OD)
We propose several strategies to further improve this baseline, which mainly include target guided semi-dense depth estimation, keypoint-aware 2D object detection, and class-level loss adjustment.
Combining all the developed techniques, the obtained pre-training framework produces pre-trained backbones that improve M3OD performance significantly on the KITTI-3D and nuScenes benchmarks.
arXiv Detail & Related papers (2022-06-08T03:01:13Z) - Prediction-Guided Distillation for Dense Object Detection [7.5320132424481505]
We show that only a very small fraction of features within a ground-truth bounding box are responsible for a teacher's high detection performance.
We propose Prediction-Guided Distillation (PGD), which focuses distillation on these key predictive regions of the teacher.
Our proposed approach outperforms current state-of-the-art KD baselines on a variety of advanced one-stage detection architectures.
arXiv Detail & Related papers (2022-03-10T16:46:05Z) - How and When Adversarial Robustness Transfers in Knowledge Distillation? [137.11016173468457]
This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in Knowledge distillation (KD)
We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy.
Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model.
arXiv Detail & Related papers (2021-10-22T21:30:53Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.