Distilling Object Detectors with Task Adaptive Regularization
- URL: http://arxiv.org/abs/2006.13108v1
- Date: Tue, 23 Jun 2020 15:58:22 GMT
- Title: Distilling Object Detectors with Task Adaptive Regularization
- Authors: Ruoyu Sun, Fuhui Tang, Xiaopeng Zhang, Hongkai Xiong, Qi Tian
- Abstract summary: Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
- Score: 97.52935611385179
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current state-of-the-art object detectors are at the expense of high
computational costs and are hard to deploy to low-end devices. Knowledge
distillation, which aims at training a smaller student network by transferring
knowledge from a larger teacher model, is one of the promising solutions for
model miniaturization. In this paper, we investigate each module of a typical
detector in depth, and propose a general distillation framework that adaptively
transfers knowledge from teacher to student according to the task specific
priors. The intuition is that simply distilling all information from teacher to
student is not advisable, instead we should only borrow priors from the teacher
model where the student cannot perform well. Towards this goal, we propose a
region proposal sharing mechanism to interflow region responses between the
teacher and student models. Based on this, we adaptively transfer knowledge at
three levels, \emph{i.e.}, feature backbone, classification head, and bounding
box regression head, according to which model performs more reasonably.
Furthermore, considering that it would introduce optimization dilemma when
minimizing distillation loss and detection loss simultaneously, we propose a
distillation decay strategy to help improve model generalization via gradually
reducing the distillation penalty. Experiments on widely used detection
benchmarks demonstrate the effectiveness of our method. In particular, using
Faster R-CNN with FPN as an instantiation, we achieve an accuracy of $39.0\%$
with Resnet-50 on COCO dataset, which surpasses the baseline $36.3\%$ by
$2.7\%$ points, and even better than the teacher model with $38.5\%$ mAP.
Related papers
- Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z) - Unbiased Knowledge Distillation for Recommendation [66.82575287129728]
Knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency.
Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge to supervise the learning of a compact student model.
We find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation.
arXiv Detail & Related papers (2022-11-27T05:14:03Z) - Prediction-Guided Distillation for Dense Object Detection [7.5320132424481505]
We show that only a very small fraction of features within a ground-truth bounding box are responsible for a teacher's high detection performance.
We propose Prediction-Guided Distillation (PGD), which focuses distillation on these key predictive regions of the teacher.
Our proposed approach outperforms current state-of-the-art KD baselines on a variety of advanced one-stage detection architectures.
arXiv Detail & Related papers (2022-03-10T16:46:05Z) - Anomaly Detection via Reverse Distillation from One-Class Embedding [2.715884199292287]
We propose a novel T-S model consisting of a teacher encoder and a student decoder.
Instead of receiving raw images directly, the student network takes teacher model's one-class embedding as input.
In addition, we introduce a trainable one-class bottleneck embedding module in our T-S model.
arXiv Detail & Related papers (2022-01-26T01:48:37Z) - General Instance Distillation for Object Detection [12.720908566642812]
RetinaNet with ResNet-50 achieves 39.1% in mAP with GID on dataset, which surpasses the baseline 36.2% by 2.9%, and even better than the ResNet-101 based teacher model with 38.1% AP.
arXiv Detail & Related papers (2021-03-03T11:41:26Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Knowledge distillation via adaptive instance normalization [52.91164959767517]
We propose a new knowledge distillation method based on transferring feature statistics from the teacher to the student.
Our method goes beyond the standard way of enforcing the mean and variance of the student to be similar to those of the teacher.
We show that our distillation method outperforms other state-of-the-art distillation methods over a large set of experimental settings.
arXiv Detail & Related papers (2020-03-09T17:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.