Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation
- URL: http://arxiv.org/abs/2308.09105v1
- Date: Thu, 17 Aug 2023 17:17:08 GMT
- Title: Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation
- Authors: Shengcao Cao, Mengtian Li, James Hays, Deva Ramanan, Yi-Xiong Wang,
Liang-Yan Gui
- Abstract summary: We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
- Score: 56.053397775016755
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Resource-constrained perception systems such as edge computing and
vision-for-robotics require vision models to be both accurate and lightweight
in computation and memory usage. While knowledge distillation is a proven
strategy to enhance the performance of lightweight classification models, its
application to structured outputs like object detection and instance
segmentation remains a complicated task, due to the variability in outputs and
complex internal network modules involved in the distillation process. In this
paper, we propose a simple yet surprisingly effective sequential approach to
knowledge distillation that progressively transfers the knowledge of a set of
teacher detectors to a given lightweight student. To distill knowledge from a
highly accurate but complex teacher model, we construct a sequence of teachers
to help the student gradually adapt. Our progressive strategy can be easily
combined with existing detection distillation mechanisms to consistently
maximize student performance in various settings. To the best of our knowledge,
we are the first to successfully distill knowledge from Transformer-based
teacher detectors to convolution-based students, and unprecedentedly boost the
performance of ResNet-50 based RetinaNet from 36.5% to 42.0% AP and Mask R-CNN
from 38.2% to 42.5% AP on the MS COCO benchmark.
Related papers
- CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination [28.061239778773423]
Contrastive Language-Image Pre-training (CLIP) has achieved excellent performance over a wide range of tasks.
CLIP heavily relies on a substantial corpus of pre-training data, resulting in notable consumption of computational resources.
We introduce CLIP-CID, a novel distillation mechanism that effectively transfers knowledge from a large vision-language foundation model to a smaller model.
arXiv Detail & Related papers (2024-08-18T11:23:21Z) - Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection [47.0507287491627]
We propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection.
By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model.
Our method performs effectively during the KD process without requiring intricate structures or extensive computational resources.
arXiv Detail & Related papers (2024-06-11T06:51:02Z) - TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation [6.856317526681759]
Visual place recognition plays a pivotal role in autonomous exploration and navigation of mobile robots.
Existing methods overcome this by exploiting powerful yet large networks.
We propose a high-performance teacher and lightweight student distillation framework called TSCM.
arXiv Detail & Related papers (2024-04-02T02:29:41Z) - Instance-Conditional Knowledge Distillation for Object Detection [59.56780046291835]
We propose an instance-conditional distillation framework to find desired knowledge.
We use observed instances as condition information and formulate the retrieval process as an instance-conditional decoding process.
arXiv Detail & Related papers (2021-10-25T08:23:29Z) - Distilling Image Classifiers in Object Detectors [81.63849985128527]
We study the case of object detection and, instead of following the standard detector-to-detector distillation approach, introduce a classifier-to-detector knowledge transfer framework.
In particular, we propose strategies to exploit the classification teacher to improve both the detector's recognition accuracy and localization performance.
arXiv Detail & Related papers (2021-06-09T16:50:10Z) - Distilling Object Detectors via Decoupled Features [69.62967325617632]
We present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector.
Experiments on various detectors with different backbones show that the proposed DeFeat is able to surpass the state-of-the-art distillation methods for object detection.
arXiv Detail & Related papers (2021-03-26T13:58:49Z) - General Instance Distillation for Object Detection [12.720908566642812]
RetinaNet with ResNet-50 achieves 39.1% in mAP with GID on dataset, which surpasses the baseline 36.2% by 2.9%, and even better than the ResNet-101 based teacher model with 38.1% AP.
arXiv Detail & Related papers (2021-03-03T11:41:26Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.