IDa-Det: An Information Discrepancy-aware Distillation for 1-bit
Detectors
- URL: http://arxiv.org/abs/2210.03477v1
- Date: Fri, 7 Oct 2022 12:04:14 GMT
- Title: IDa-Det: An Information Discrepancy-aware Distillation for 1-bit
Detectors
- Authors: Sheng Xu, Yanjing Li, Bohan Zeng, Teli ma, Baochang Zhang, Xianbin
Cao, Peng Gao, Jinhu Lv
- Abstract summary: Knowledge distillation (KD) is useful for training compact object detection models.
KD is often effective when the teacher model and student counterpart share similar proposal information.
This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors.
- Score: 30.452449805950593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation (KD) has been proven to be useful for training compact
object detection models. However, we observe that KD is often effective when
the teacher model and student counterpart share similar proposal information.
This explains why existing KD methods are less effective for 1-bit detectors,
caused by a significant information discrepancy between the real-valued teacher
and the 1-bit student. This paper presents an Information Discrepancy-aware
strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate
information discrepancies and significantly reduce the performance gap between
a 1-bit detector and its real-valued counterpart. We formulate the distillation
process as a bi-level optimization formulation. At the inner level, we select
the representative proposals with maximum information discrepancy. We then
introduce a novel entropy distillation loss to reduce the disparity based on
the selected proposals. Extensive experiments demonstrate IDa-Det's superiority
over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and
COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with
ResNet-18 backbone. Our code is open-sourced on
https://github.com/SteveTsui/IDa-Det.
Related papers
- CLoCKDistill: Consistent Location-and-Context-aware Knowledge Distillation for DETRs [2.7624021966289605]
This paper proposes Consistent Location-and-Context-aware Knowledge Distillation (CLoCKDistill) for DETR detectors.
We distill the transformer encoder output (i.e., memory) that contains valuable global context and long-range dependencies.
Our method boosts student detector performance by 2.2% to 6.4%.
arXiv Detail & Related papers (2025-02-15T06:02:51Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - Unbiased Knowledge Distillation for Recommendation [66.82575287129728]
Knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency.
Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge to supervise the learning of a compact student model.
We find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation.
arXiv Detail & Related papers (2022-11-27T05:14:03Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Localization Distillation for Object Detection [134.12664548771534]
Previous knowledge distillation (KD) methods for object detection mostly focus on feature imitation instead of mimicking the classification logits.
We present a novel localization distillation (LD) method which can efficiently transfer the localization knowledge from the teacher to the student.
We show that logit mimicking can outperform feature imitation and the absence of localization distillation is a critical reason for why logit mimicking underperforms for years.
arXiv Detail & Related papers (2022-04-12T17:14:34Z) - Prediction-Guided Distillation for Dense Object Detection [7.5320132424481505]
We show that only a very small fraction of features within a ground-truth bounding box are responsible for a teacher's high detection performance.
We propose Prediction-Guided Distillation (PGD), which focuses distillation on these key predictive regions of the teacher.
Our proposed approach outperforms current state-of-the-art KD baselines on a variety of advanced one-stage detection architectures.
arXiv Detail & Related papers (2022-03-10T16:46:05Z) - Dual Correction Strategy for Ranking Distillation in Top-N Recommender System [22.37864671297929]
This paper presents Dual Correction strategy for Knowledge Distillation (DCD)
DCD transfers the ranking information from the teacher model to the student model in a more efficient manner.
Our experiments show that the proposed method outperforms the state-of-the-art baselines.
arXiv Detail & Related papers (2021-09-08T07:00:45Z) - Towards Reducing Labeling Cost in Deep Object Detection [61.010693873330446]
We propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector.
Our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift.
arXiv Detail & Related papers (2021-06-22T16:53:09Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.