CLoCKDistill: Consistent Location-and-Context-aware Knowledge Distillation for DETRs
- URL: http://arxiv.org/abs/2502.10683v1
- Date: Sat, 15 Feb 2025 06:02:51 GMT
- Title: CLoCKDistill: Consistent Location-and-Context-aware Knowledge Distillation for DETRs
- Authors: Qizhen Lan, Qing Tian,
- Abstract summary: This paper proposes Consistent Location-and-Context-aware Knowledge Distillation (CLoCKDistill) for DETR detectors.
We distill the transformer encoder output (i.e., memory) that contains valuable global context and long-range dependencies.
Our method boosts student detector performance by 2.2% to 6.4%.
- Score: 2.7624021966289605
- License:
- Abstract: Object detection has advanced significantly with Detection Transformers (DETRs). However, these models are computationally demanding, posing challenges for deployment in resource-constrained environments (e.g., self-driving cars). Knowledge distillation (KD) is an effective compression method widely applied to CNN detectors, but its application to DETR models has been limited. Most KD methods for DETRs fail to distill transformer-specific global context. Also, they blindly believe in the teacher model, which can sometimes be misleading. To bridge the gaps, this paper proposes Consistent Location-and-Context-aware Knowledge Distillation (CLoCKDistill) for DETR detectors, which includes both feature distillation and logit distillation components. For feature distillation, instead of distilling backbone features like existing KD methods, we distill the transformer encoder output (i.e., memory) that contains valuable global context and long-range dependencies. Also, we enrich this memory with object location details during feature distillation so that the student model can prioritize relevant regions while effectively capturing the global context. To facilitate logit distillation, we create target-aware queries based on the ground truth, allowing both the student and teacher decoders to attend to consistent and accurate parts of encoder memory. Experiments on the KITTI and COCO datasets show our CLoCKDistill method's efficacy across various DETRs, e.g., single-scale DAB-DETR, multi-scale deformable DETR, and denoising-based DINO. Our method boosts student detector performance by 2.2% to 6.4%.
Related papers
- Knowledge Distillation via Query Selection for Detection Transformer [25.512519971607237]
This paper addresses the challenge of compressing DETR by leveraging knowledge distillation.
A critical aspect of DETRs' performance is their reliance on queries to interpret object representations accurately.
Our visual analysis indicates that hard-negative queries, focusing on foreground elements, are crucial for enhancing distillation outcomes.
arXiv Detail & Related papers (2024-09-10T11:49:28Z) - Efficient Object Detection in Optical Remote Sensing Imagery via
Attention-based Feature Distillation [29.821082433621868]
We propose Attention-based Feature Distillation (AFD) for object detection.
We introduce a multi-instance attention mechanism that effectively distinguishes between background and foreground elements.
AFD attains the performance of other state-of-the-art models while being efficient.
arXiv Detail & Related papers (2023-10-28T11:15:37Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - Continual Detection Transformer for Incremental Object Detection [154.8345288298059]
Incremental object detection (IOD) aims to train an object detector in phases, each with annotations for new object categories.
As other incremental settings, IOD is subject to catastrophic forgetting, which is often addressed by techniques such as knowledge distillation (KD) and exemplar replay (ER)
We propose a new method for transformer-based IOD which enables effective usage of KD and ER in this context.
arXiv Detail & Related papers (2023-04-06T14:38:40Z) - DETRDistill: A Universal Knowledge Distillation Framework for
DETR-families [11.9748352746424]
Transformer-based detectors (DETRs) have attracted great attention due to their sparse training paradigm and the removal of post-processing operations.
Knowledge distillation (KD) can be employed to compress the huge model by constructing a universal teacher-student learning framework.
arXiv Detail & Related papers (2022-11-17T13:35:11Z) - Knowledge Distillation for Detection Transformer with Consistent
Distillation Points Sampling [38.60121990752897]
We propose a knowledge distillation paradigm for DETR(KD-DETR) with consistent distillation points sampling.
KD-DETR boosts the performance of DAB-DETR with ResNet-18 and ResNet-50 backbone to 41.4$%$, 45.7$%$ mAP, and ResNet-50 even surpasses the teacher model by $2.2%$.
arXiv Detail & Related papers (2022-11-15T11:52:30Z) - IDa-Det: An Information Discrepancy-aware Distillation for 1-bit
Detectors [30.452449805950593]
Knowledge distillation (KD) is useful for training compact object detection models.
KD is often effective when the teacher model and student counterpart share similar proposal information.
This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors.
arXiv Detail & Related papers (2022-10-07T12:04:14Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Localization Distillation for Object Detection [134.12664548771534]
Previous knowledge distillation (KD) methods for object detection mostly focus on feature imitation instead of mimicking the classification logits.
We present a novel localization distillation (LD) method which can efficiently transfer the localization knowledge from the teacher to the student.
We show that logit mimicking can outperform feature imitation and the absence of localization distillation is a critical reason for why logit mimicking underperforms for years.
arXiv Detail & Related papers (2022-04-12T17:14:34Z) - TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation [49.794142076551026]
Transformer-based Knowledge Distillation (TransKD) framework learns compact student transformers by distilling both feature maps and patch embeddings of large teacher transformers.
Experiments on Cityscapes, ACDC, NYUv2, and Pascal VOC2012 datasets show that TransKD outperforms state-of-the-art distillation frameworks.
arXiv Detail & Related papers (2022-02-27T16:34:10Z) - Localization Distillation for Object Detection [79.78619050578997]
We propose localization distillation (LD) for object detection.
Our LD can be formulated as standard KD by adopting the general localization representation of bounding box.
We suggest a teacher assistant (TA) strategy to fill the possible gap between teacher model and student model.
arXiv Detail & Related papers (2021-02-24T12:26:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.