HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
- URL: http://arxiv.org/abs/2207.05345v1
- Date: Tue, 12 Jul 2022 07:01:34 GMT
- Title: HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
- Authors: Luting Wang, Xiaojie Li, Yue Liao, Zeren Jiang, Jianlong Wu, Fei Wang,
Chen Qian, Si Liu
- Abstract summary: We investigate KD among heterogeneous teacher-student pairs for a wide application.
We propose the HEtero-Assists Distillation (HEAD) framework, leveraging heterogeneous detection heads as assistants.
Our method has achieved significant improvement compared to current detection KD methods.
- Score: 34.90279031067575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional knowledge distillation (KD) methods for object detection mainly
concentrate on homogeneous teacher-student detectors. However, the design of a
lightweight detector for deployment is often significantly different from a
high-capacity detector. Thus, we investigate KD among heterogeneous
teacher-student pairs for a wide application. We observe that the core
difficulty for heterogeneous KD (hetero-KD) is the significant semantic gap
between the backbone features of heterogeneous detectors due to the different
optimization manners. Conventional homogeneous KD (homo-KD) methods suffer from
such a gap and are hard to directly obtain satisfactory performance for
hetero-KD. In this paper, we propose the HEtero-Assists Distillation (HEAD)
framework, leveraging heterogeneous detection heads as assistants to guide the
optimization of the student detector to reduce this gap. In HEAD, the assistant
is an additional detection head with the architecture homogeneous to the
teacher head attached to the student backbone. Thus, a hetero-KD is transformed
into a homo-KD, allowing efficient knowledge transfer from the teacher to the
student. Moreover, we extend HEAD into a Teacher-Free HEAD (TF-HEAD) framework
when a well-trained teacher detector is unavailable. Our method has achieved
significant improvement compared to current detection KD methods. For example,
on the MS-COCO dataset, TF-HEAD helps R18 RetinaNet achieve 33.9 mAP (+2.2),
while HEAD further pushes the limit to 36.2 mAP (+4.5).
Related papers
- TAS: Distilling Arbitrary Teacher and Student via a Hybrid Assistant [52.0297393822012]
We introduce an assistant model as a bridge to facilitate smooth feature knowledge transfer between heterogeneous teachers and students.
Within our proposed design principle, the assistant model combines the advantages of cross-architecture inductive biases and module functions.
Our proposed method is evaluated across some homogeneous model pairs and arbitrary heterogeneous combinations of CNNs, ViTs, spatial KDs.
arXiv Detail & Related papers (2024-10-16T08:02:49Z) - One-for-All: Bridge the Gap Between Heterogeneous Architectures in
Knowledge Distillation [69.65734716679925]
Knowledge distillation has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme.
Most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family.
We propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
arXiv Detail & Related papers (2023-10-30T11:13:02Z) - CrossKD: Cross-Head Knowledge Distillation for Object Detection [69.16346256926842]
Knowledge Distillation (KD) has been validated as an effective model compression technique for learning compact object detectors.
We present a prediction mimicking distillation scheme, called CrossKD, which delivers the intermediate features of the student's detection head to the teacher's detection head.
Our CrossKD boosts the average precision of GFL ResNet-50 with 1x training schedule from 40.2 to 43.7, outperforming all existing KD methods.
arXiv Detail & Related papers (2023-06-20T08:19:51Z) - Gradient-Guided Knowledge Distillation for Object Detectors [3.236217153362305]
We propose a novel approach for knowledge distillation in object detection, named Gradient-guided Knowledge Distillation (GKD)
Our GKD uses gradient information to identify and assign more weights to features that significantly impact the detection loss, allowing the student to learn the most relevant features from the teacher.
Experiments on the KITTI and COCO-Traffic datasets demonstrate our method's efficacy in knowledge distillation for object detection.
arXiv Detail & Related papers (2023-03-07T21:09:09Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - PKD: General Distillation Framework for Object Detectors via Pearson
Correlation Coefficient [18.782520279344553]
This paper empirically find that better FPN features from a heterogeneous teacher detector can help the student.
We propose to imitate features with Pearson Correlation Coefficient to focus on the relational information from the teacher.
Our method consistently outperforms the existing detection KD methods and works for both homogeneous and heterogeneous student-teacher pairs.
arXiv Detail & Related papers (2022-07-05T13:37:34Z) - Knowledge Distillation for Object Detection via Rank Mimicking and
Prediction-guided Feature Imitation [34.441349114336994]
We propose Rank Mimicking (RM) and Prediction-guided Feature Imitation (PFI) for distilling one-stage detectors.
RM takes the rank of candidate boxes from teachers as a new form of knowledge to distill.
PFI attempts to correlate feature differences with prediction differences, making feature imitation directly help to improve the student's accuracy.
arXiv Detail & Related papers (2021-12-09T11:19:15Z) - G-DetKD: Towards General Distillation Framework for Object Detectors via
Contrastive and Semantic-guided Feature Imitation [49.421099172544196]
We propose a novel semantic-guided feature imitation technique, which automatically performs soft matching between feature pairs across all pyramid levels.
We also introduce contrastive distillation to effectively capture the information encoded in the relationship between different feature regions.
Our method consistently outperforms the existing detection KD techniques, and works when (1) components in the framework are used separately and in conjunction.
arXiv Detail & Related papers (2021-08-17T07:44:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.