Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation
- URL: http://arxiv.org/abs/2209.09841v3
- Date: Wed, 21 Feb 2024 15:02:31 GMT
- Title: Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation
- Authors: Jiawei Liang, Siyuan Liang, Aishan Liu, Ke Ma, Jingzhi Li, Xiaochun
Cao
- Abstract summary: Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
- Score: 66.25738680429463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge Distillation (KD) for object detection aims to train a compact
detector by transferring knowledge from a teacher model. Since the teacher
model perceives data in a way different from humans, existing KD methods only
distill knowledge that is consistent with labels annotated by human expert
while neglecting knowledge that is not consistent with human perception, which
results in insufficient distillation and sub-optimal performance. In this
paper, we propose inconsistent knowledge distillation (IKD), which aims to
distill knowledge inherent in the teacher model's counter-intuitive
perceptions. We start by considering the teacher model's counter-intuitive
perceptions of frequency and non-robust features. Unlike previous works that
exploit fine-grained features or introduce additional regularizations, we
extract inconsistent knowledge by providing diverse input using data
augmentation. Specifically, we propose a sample-specific data augmentation to
transfer the teacher model's ability in capturing distinct frequency components
and suggest an adversarial feature augmentation to extract the teacher model's
perceptions of non-robust features in the data. Extensive experiments
demonstrate the effectiveness of our method which outperforms state-of-the-art
KD baselines on one-stage, two-stage and anchor-free object detectors (at most
+1.0 mAP). Our codes will be made available at
\url{https://github.com/JWLiang007/IKD.git}.
Related papers
- Adaptive Explicit Knowledge Transfer for Knowledge Distillation [17.739979156009696]
We show that the performance of logit-based knowledge distillation can be improved by effectively delivering the probability distribution for the non-target classes from the teacher model.
We propose a new loss that enables the student to learn explicit knowledge along with implicit knowledge in an adaptive manner.
Experimental results demonstrate that the proposed method, called adaptive explicit knowledge transfer (AEKT) method, achieves improved performance compared to the state-of-the-art KD methods.
arXiv Detail & Related papers (2024-09-03T07:42:59Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - Hint-dynamic Knowledge Distillation [30.40008256306688]
Hint-dynamic Knowledge Distillation, dubbed HKD, excavates the knowledge from the teacher's hints in a dynamic scheme.
A meta-weight network is introduced to generate the instance-wise weight coefficients about knowledge hints.
Experiments on standard benchmarks of CIFAR-100 and Tiny-ImageNet manifest that the proposed HKD well boost the effect of knowledge distillation tasks.
arXiv Detail & Related papers (2022-11-30T15:03:53Z) - Unbiased Knowledge Distillation for Recommendation [66.82575287129728]
Knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency.
Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge to supervise the learning of a compact student model.
We find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation.
arXiv Detail & Related papers (2022-11-27T05:14:03Z) - Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge
Distillation [70.92135839545314]
We propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation.
Our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
arXiv Detail & Related papers (2022-06-13T11:52:13Z) - Knowledge Distillation for Object Detection via Rank Mimicking and
Prediction-guided Feature Imitation [34.441349114336994]
We propose Rank Mimicking (RM) and Prediction-guided Feature Imitation (PFI) for distilling one-stage detectors.
RM takes the rank of candidate boxes from teachers as a new form of knowledge to distill.
PFI attempts to correlate feature differences with prediction differences, making feature imitation directly help to improve the student's accuracy.
arXiv Detail & Related papers (2021-12-09T11:19:15Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z) - Role-Wise Data Augmentation for Knowledge Distillation [48.115719640111394]
Knowledge Distillation (KD) is a common method for transferring the knowledge'' learned by one machine learning model into another.
We design data augmentation agents with distinct roles to facilitate knowledge distillation.
We find empirically that specially tailored data points enable the teacher's knowledge to be demonstrated more effectively to the student.
arXiv Detail & Related papers (2020-04-19T14:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.