Knowledge Distillation for Detection Transformer with Consistent
Distillation Points Sampling
- URL: http://arxiv.org/abs/2211.08071v2
- Date: Wed, 16 Nov 2022 03:01:40 GMT
- Title: Knowledge Distillation for Detection Transformer with Consistent
Distillation Points Sampling
- Authors: Yu Wang, Xin Li, Shengzhao Wen, Fukui Yang, Wanping Zhang, Gang Zhang,
Haocheng Feng, Junyu Han, Errui Ding
- Abstract summary: We propose a knowledge distillation paradigm for DETR(KD-DETR) with consistent distillation points sampling.
KD-DETR boosts the performance of DAB-DETR with ResNet-18 and ResNet-50 backbone to 41.4$%$, 45.7$%$ mAP, and ResNet-50 even surpasses the teacher model by $2.2%$.
- Score: 38.60121990752897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: DETR is a novel end-to-end transformer architecture object detector, which
significantly outperforms classic detectors when scaling up the model size. In
this paper, we focus on the compression of DETR with knowledge distillation.
While knowledge distillation has been well-studied in classic detectors, there
is a lack of researches on how to make it work effectively on DETR. We first
provide experimental and theoretical analysis to point out that the main
challenge in DETR distillation is the lack of consistent distillation points.
Distillation points refer to the corresponding inputs of the predictions for
student to mimic, and reliable distillation requires sufficient distillation
points which are consistent between teacher and student. Based on this
observation, we propose a general knowledge distillation paradigm for
DETR(KD-DETR) with consistent distillation points sampling. Specifically, we
decouple detection and distillation tasks by introducing a set of specialized
object queries to construct distillation points. In this paradigm, we further
propose a general-to-specific distillation points sampling strategy to explore
the extensibility of KD-DETR. Extensive experiments on different DETR
architectures with various scales of backbones and transformer layers validate
the effectiveness and generalization of KD-DETR. KD-DETR boosts the performance
of DAB-DETR with ResNet-18 and ResNet-50 backbone to 41.4$\%$, 45.7$\%$ mAP,
respectively, which are 5.2$\%$, 3.5$\%$ higher than the baseline, and
ResNet-50 even surpasses the teacher model by $2.2\%$.
Related papers
- Knowledge Distillation via Query Selection for Detection Transformer [25.512519971607237]
This paper addresses the challenge of compressing DETR by leveraging knowledge distillation.
A critical aspect of DETRs' performance is their reliance on queries to interpret object representations accurately.
Our visual analysis indicates that hard-negative queries, focusing on foreground elements, are crucial for enhancing distillation outcomes.
arXiv Detail & Related papers (2024-09-10T11:49:28Z) - Dual Knowledge Distillation for Efficient Sound Event Detection [20.236008919003083]
Sound event detection (SED) is essential for recognizing specific sounds and their temporal locations within acoustic signals.
We introduce a novel framework referred to as dual knowledge distillation for developing efficient SED systems.
arXiv Detail & Related papers (2024-02-05T07:30:32Z) - Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z) - Continual Detection Transformer for Incremental Object Detection [154.8345288298059]
Incremental object detection (IOD) aims to train an object detector in phases, each with annotations for new object categories.
As other incremental settings, IOD is subject to catastrophic forgetting, which is often addressed by techniques such as knowledge distillation (KD) and exemplar replay (ER)
We propose a new method for transformer-based IOD which enables effective usage of KD and ER in this context.
arXiv Detail & Related papers (2023-04-06T14:38:40Z) - Q-DETR: An Efficient Low-Bit Quantized Detection Transformer [50.00784028552792]
We find that the bottlenecks of Q-DETR come from the query information distortion through our empirical analyses.
We formulate our DRD as a bi-level optimization problem, which can be derived by generalizing the information bottleneck (IB) principle to the learning of Q-DETR.
We introduce a new foreground-aware query matching scheme to effectively transfer the teacher information to distillation-desired features to minimize the conditional information entropy.
arXiv Detail & Related papers (2023-04-01T08:05:14Z) - StereoDistill: Pick the Cream from LiDAR for Distilling Stereo-based 3D
Object Detection [93.10989714186788]
We propose a cross-modal distillation method named StereoDistill to narrow the gap between the stereo and LiDAR-based approaches.
Key designs of StereoDistill are: the X-component Guided Distillation(XGD) for regression and the Cross-anchor Logit Distillation(CLD) for classification.
arXiv Detail & Related papers (2023-01-04T13:38:48Z) - DETRDistill: A Universal Knowledge Distillation Framework for
DETR-families [11.9748352746424]
Transformer-based detectors (DETRs) have attracted great attention due to their sparse training paradigm and the removal of post-processing operations.
Knowledge distillation (KD) can be employed to compress the huge model by constructing a universal teacher-student learning framework.
arXiv Detail & Related papers (2022-11-17T13:35:11Z) - Adaptive Instance Distillation for Object Detection in Autonomous
Driving [3.236217153362305]
We propose Adaptive Instance Distillation (AID) to selectively impart teacher's knowledge to the student to improve the performance of knowledge distillation.
Our AID is also shown to be useful for self-distillation to improve the teacher model's performance.
arXiv Detail & Related papers (2022-01-26T18:06:33Z) - G-DetKD: Towards General Distillation Framework for Object Detectors via
Contrastive and Semantic-guided Feature Imitation [49.421099172544196]
We propose a novel semantic-guided feature imitation technique, which automatically performs soft matching between feature pairs across all pyramid levels.
We also introduce contrastive distillation to effectively capture the information encoded in the relationship between different feature regions.
Our method consistently outperforms the existing detection KD techniques, and works when (1) components in the framework are used separately and in conjunction.
arXiv Detail & Related papers (2021-08-17T07:44:27Z) - General Instance Distillation for Object Detection [12.720908566642812]
RetinaNet with ResNet-50 achieves 39.1% in mAP with GID on dataset, which surpasses the baseline 36.2% by 2.9%, and even better than the ResNet-101 based teacher model with 38.1% AP.
arXiv Detail & Related papers (2021-03-03T11:41:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.