SAMKD: Spatial-aware Adaptive Masking Knowledge Distillation for Object Detection
- URL: http://arxiv.org/abs/2501.07101v2
- Date: Mon, 24 Mar 2025 12:47:59 GMT
- Title: SAMKD: Spatial-aware Adaptive Masking Knowledge Distillation for Object Detection
- Authors: Zhourui Zhang, Jun Li, Jiayan Li, Jianhua Xu,
- Abstract summary: We propose a spatial-aware Adaptive Masking Knowledge Distillation framework for accurate object detection.<n>Our method improves the student network from 35.3% to 38.8% mAP, outperforming state-of-the-art distillation methods.
- Score: 4.33169417430713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of recent attention-guided feature masking distillation methods perform knowledge transfer via global teacher attention maps without delving into fine-grained clues. Instead, performing distillation at finer granularity is conducive to uncovering local details supplementary to global knowledge transfer and reconstructing comprehensive student features. In this study, we propose a Spatial-aware Adaptive Masking Knowledge Distillation (SAMKD) framework for accurate object detection. Different from previous feature distillation methods which mainly perform single-scale feature masking, we develop spatially hierarchical feature masking distillation scheme, such that the object-aware locality is encoded during coarse-to-fine distillation process for improved feature reconstruction. In addition, our spatial-aware feature distillation strategy is combined with a masking logit distillation scheme in which region-specific feature difference between teacher and student networks is utilized to adaptively guide the distillation process. Thus, it can help the student model to better learn from the teacher counterpart with improved knowledge transfer and reduced gap. Extensive experiments for detection task demonstrate the superiority of our method. For example, when FCOS is used as teacher detector with ResNet101 backbone, our method improves the student network from 35.3\% to 38.8\% mAP, outperforming state-of-the-art distillation methods including MGD, FreeKD and DMKD.
Related papers
- DFMSD: Dual Feature Masking Stage-wise Knowledge Distillation for Object Detection [6.371066478190595]
A novel dual feature-masking heterogeneous distillation framework termed DFMSD is proposed for object detection.
A masking enhancement strategy is combined with stage-wise learning to improve feature-masking reconstruction.
Experiments for the object detection task demonstrate the promise of our approach.
arXiv Detail & Related papers (2024-07-18T04:19:14Z) - Object-centric Cross-modal Feature Distillation for Event-based Object
Detection [87.50272918262361]
RGB detectors still outperform event-based detectors due to sparsity of the event data and missing visual details.
We develop a novel knowledge distillation approach to shrink the performance gap between these two modalities.
We show that object-centric distillation allows to significantly improve the performance of the event-based student object detector.
arXiv Detail & Related papers (2023-11-09T16:33:08Z) - DMKD: Improving Feature-based Knowledge Distillation for Object
Detection Via Dual Masking Augmentation [10.437237606721222]
We devise a Dual Masked Knowledge Distillation (DMKD) framework which can capture both spatially important and channel-wise informative clues.
Our experiments on object detection task demonstrate that the student networks achieve performance gains of 4.1% and 4.3% with the help of our method.
arXiv Detail & Related papers (2023-09-06T05:08:51Z) - Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - Gradient-Guided Knowledge Distillation for Object Detectors [3.236217153362305]
We propose a novel approach for knowledge distillation in object detection, named Gradient-guided Knowledge Distillation (GKD)
Our GKD uses gradient information to identify and assign more weights to features that significantly impact the detection loss, allowing the student to learn the most relevant features from the teacher.
Experiments on the KITTI and COCO-Traffic datasets demonstrate our method's efficacy in knowledge distillation for object detection.
arXiv Detail & Related papers (2023-03-07T21:09:09Z) - AMD: Adaptive Masked Distillation for Object [8.668808292258706]
We propose a spatial-channel adaptive masked distillation (AMD) network for object detection.
We employ a simple and efficient module to allow the student network channel to be adaptive.
With the help of our proposed distillation method, the student networks report 41.3%, 42.4%, and 42.7% mAP scores.
arXiv Detail & Related papers (2023-01-31T10:32:13Z) - Class-aware Information for Logit-based Knowledge Distillation [16.634819319915923]
We propose a Class-aware Logit Knowledge Distillation (CLKD) method, that extents the logit distillation in both instance-level and class-level.
CLKD enables the student model mimic higher semantic information from the teacher model, hence improving the distillation performance.
arXiv Detail & Related papers (2022-11-27T09:27:50Z) - Localization Distillation for Object Detection [134.12664548771534]
Previous knowledge distillation (KD) methods for object detection mostly focus on feature imitation instead of mimicking the classification logits.
We present a novel localization distillation (LD) method which can efficiently transfer the localization knowledge from the teacher to the student.
We show that logit mimicking can outperform feature imitation and the absence of localization distillation is a critical reason for why logit mimicking underperforms for years.
arXiv Detail & Related papers (2022-04-12T17:14:34Z) - Distilling Image Classifiers in Object Detectors [81.63849985128527]
We study the case of object detection and, instead of following the standard detector-to-detector distillation approach, introduce a classifier-to-detector knowledge transfer framework.
In particular, we propose strategies to exploit the classification teacher to improve both the detector's recognition accuracy and localization performance.
arXiv Detail & Related papers (2021-06-09T16:50:10Z) - Distilling Object Detectors via Decoupled Features [69.62967325617632]
We present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector.
Experiments on various detectors with different backbones show that the proposed DeFeat is able to surpass the state-of-the-art distillation methods for object detection.
arXiv Detail & Related papers (2021-03-26T13:58:49Z) - Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge
Distillation [12.097302014936655]
This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD)
Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation.
We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets.
arXiv Detail & Related papers (2021-03-15T10:59:43Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.