LGD: Label-guided Self-distillation for Object Detection
- URL: http://arxiv.org/abs/2109.11496v1
- Date: Thu, 23 Sep 2021 16:55:01 GMT
- Title: LGD: Label-guided Self-distillation for Object Detection
- Authors: Peizhen Zhang, Zijian Kang, Tong Yang, Xiangyu Zhang, Nanning Zheng,
Jian Sun
- Abstract summary: We propose the first self-distillation framework for general object detection, termed LGD (Label-Guided self-Distillation)
Our framework involves sparse label-appearance encoding, inter-object relation adaptation and intra-object knowledge mapping to obtain the instructive knowledge.
Compared with a classical teacher-based method FGFI, LGD not only performs better without requiring pretrained teacher but also with 51% lower training cost beyond inherent student learning.
- Score: 59.9972914042281
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose the first self-distillation framework for general
object detection, termed LGD (Label-Guided self-Distillation). Previous studies
rely on a strong pretrained teacher to provide instructive knowledge for
distillation. However, this could be unavailable in real-world scenarios.
Instead, we generate an instructive knowledge by inter-and-intra relation
modeling among objects, requiring only student representations and regular
labels. In detail, our framework involves sparse label-appearance encoding,
inter-object relation adaptation and intra-object knowledge mapping to obtain
the instructive knowledge. Modules in LGD are trained end-to-end with student
detector and are discarded in inference. Empirically, LGD obtains decent
results on various detectors, datasets, and extensive task like instance
segmentation. For example in MS-COCO dataset, LGD improves RetinaNet with
ResNet-50 under 2x single-scale training from 36.2% to 39.0% mAP (+ 2.8%). For
much stronger detectors like FCOS with ResNeXt-101 DCN v2 under 2x multi-scale
training (46.1%), LGD achieves 47.9% (+ 1.8%). For pedestrian detection in
CrowdHuman dataset, LGD boosts mMR by 2.3% for Faster R-CNN with ResNet-50.
Compared with a classical teacher-based method FGFI, LGD not only performs
better without requiring pretrained teacher but also with 51% lower training
cost beyond inherent student learning.
Related papers
- A Teacher-Free Graph Knowledge Distillation Framework with Dual
Self-Distillation [58.813991312803246]
We propose a Teacher-Free Graph Self-Distillation (TGS) framework that does not require any teacher model or GNNs during both training and inference.
TGS enjoys the benefits of graph topology awareness in training but is free from data dependency in inference.
arXiv Detail & Related papers (2024-03-06T05:52:13Z) - Augmentation-Free Dense Contrastive Knowledge Distillation for Efficient
Semantic Segmentation [16.957139277317005]
Augmentation-free Dense Contrastive Knowledge Distillation (Af-DCD) is a new contrastive distillation learning paradigm.
Af-DCD trains compact and accurate deep neural networks for semantic segmentation applications.
arXiv Detail & Related papers (2023-12-07T09:37:28Z) - Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z) - Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular
Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation [1.3701366534590498]
Ocular Toxoplasmosis (OT) is a common eye infection caused by T. gondii that can cause vision problems.
This research seeks to provide a guide for future researchers looking to utilise DL techniques and develop a cheap, automated, easy-to-use, and accurate diagnostic method.
arXiv Detail & Related papers (2023-05-18T13:42:15Z) - Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter
Encoders for Natural Language Understanding Systems [63.713297451300086]
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B.
Their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system.
arXiv Detail & Related papers (2022-06-15T20:44:23Z) - Prediction-Guided Distillation for Dense Object Detection [7.5320132424481505]
We show that only a very small fraction of features within a ground-truth bounding box are responsible for a teacher's high detection performance.
We propose Prediction-Guided Distillation (PGD), which focuses distillation on these key predictive regions of the teacher.
Our proposed approach outperforms current state-of-the-art KD baselines on a variety of advanced one-stage detection architectures.
arXiv Detail & Related papers (2022-03-10T16:46:05Z) - G-DetKD: Towards General Distillation Framework for Object Detectors via
Contrastive and Semantic-guided Feature Imitation [49.421099172544196]
We propose a novel semantic-guided feature imitation technique, which automatically performs soft matching between feature pairs across all pyramid levels.
We also introduce contrastive distillation to effectively capture the information encoded in the relationship between different feature regions.
Our method consistently outperforms the existing detection KD techniques, and works when (1) components in the framework are used separately and in conjunction.
arXiv Detail & Related papers (2021-08-17T07:44:27Z) - General Instance Distillation for Object Detection [12.720908566642812]
RetinaNet with ResNet-50 achieves 39.1% in mAP with GID on dataset, which surpasses the baseline 36.2% by 2.9%, and even better than the ResNet-101 based teacher model with 38.1% AP.
arXiv Detail & Related papers (2021-03-03T11:41:26Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.