LumiNet: The Bright Side of Perceptual Knowledge Distillation
- URL: http://arxiv.org/abs/2310.03669v2
- Date: Sat, 9 Mar 2024 07:15:24 GMT
- Title: LumiNet: The Bright Side of Perceptual Knowledge Distillation
- Authors: Md. Ismail Hossain, M M Lutfe Elahi, Sameera Ramasinghe, Ali
Cheraghian, Fuad Rahman, Nabeel Mohammed, Shafin Rahman
- Abstract summary: We present LumiNet, a novel knowledge distillation algorithm designed to enhance logit-based distillation.
LumiNet addresses overconfidence issues in logit-based distillation method while also introducing a novel method to distill knowledge from the teacher.
It excels on benchmarks like CIFAR-100, ImageNet, and MSCOCO, outperforming leading feature-based methods.
- Score: 18.126581058419713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In knowledge distillation literature, feature-based methods have dominated
due to their ability to effectively tap into extensive teacher models. In
contrast, logit-based approaches, which aim to distill `dark knowledge' from
teachers, typically exhibit inferior performance compared to feature-based
methods. To bridge this gap, we present LumiNet, a novel knowledge distillation
algorithm designed to enhance logit-based distillation. We introduce the
concept of 'perception', aiming to calibrate logits based on the model's
representation capability. This concept addresses overconfidence issues in
logit-based distillation method while also introducing a novel method to
distill knowledge from the teacher. It reconstructs the logits of a
sample/instances by considering relationships with other samples in the batch.
LumiNet excels on benchmarks like CIFAR-100, ImageNet, and MSCOCO,
outperforming leading feature-based methods, e.g., compared to KD with ResNet18
and MobileNetV2 on ImageNet, it shows improvements of 1.5% and 2.05%,
respectively.
Related papers
- Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level Alignment [6.223632538498386]
Knowledge Distillation (KD) transfers dark knowledge from the teacher to the student via logits or intermediate features.
Recent work has uncovered the potential of the logit-based method, bringing the simple KD form based on logits back into the limelight.
arXiv Detail & Related papers (2024-11-03T12:42:16Z) - Adaptive Explicit Knowledge Transfer for Knowledge Distillation [17.739979156009696]
We show that the performance of logit-based knowledge distillation can be improved by effectively delivering the probability distribution for the non-target classes from the teacher model.
We propose a new loss that enables the student to learn explicit knowledge along with implicit knowledge in an adaptive manner.
Experimental results demonstrate that the proposed method, called adaptive explicit knowledge transfer (AEKT) method, achieves improved performance compared to the state-of-the-art KD methods.
arXiv Detail & Related papers (2024-09-03T07:42:59Z) - ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation [3.301728339780329]
We propose an innovative method to boost Knowledge Distillation efficiency without the need for resource-heavy teacher models.
In our work, we propose an efficient method for generating soft labels, thereby eliminating the need for a large teacher model.
Our experiments on various datasets, including CIFAR-100, Tiny Imagenet, and Fashion MNIST, demonstrate the superior resource efficiency of our approach.
arXiv Detail & Related papers (2024-04-15T15:54:30Z) - Class-aware Information for Logit-based Knowledge Distillation [16.634819319915923]
We propose a Class-aware Logit Knowledge Distillation (CLKD) method, that extents the logit distillation in both instance-level and class-level.
CLKD enables the student model mimic higher semantic information from the teacher model, hence improving the distillation performance.
arXiv Detail & Related papers (2022-11-27T09:27:50Z) - Unbiased Knowledge Distillation for Recommendation [66.82575287129728]
Knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency.
Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge to supervise the learning of a compact student model.
We find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation.
arXiv Detail & Related papers (2022-11-27T05:14:03Z) - Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student.
At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL)
Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z) - Localization Distillation for Object Detection [134.12664548771534]
Previous knowledge distillation (KD) methods for object detection mostly focus on feature imitation instead of mimicking the classification logits.
We present a novel localization distillation (LD) method which can efficiently transfer the localization knowledge from the teacher to the student.
We show that logit mimicking can outperform feature imitation and the absence of localization distillation is a critical reason for why logit mimicking underperforms for years.
arXiv Detail & Related papers (2022-04-12T17:14:34Z) - Self-distillation with Batch Knowledge Ensembling Improves ImageNet
Classification [57.5041270212206]
We present BAtch Knowledge Ensembling (BAKE) to produce refined soft targets for anchor images.
BAKE achieves online knowledge ensembling across multiple samples with only a single network.
It requires minimal computational and memory overhead compared to existing knowledge ensembling methods.
arXiv Detail & Related papers (2021-04-27T16:11:45Z) - Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup [91.1317510066954]
We study a little-explored but important question, i.e., knowledge distillation efficiency.
Our goal is to achieve a performance comparable to conventional knowledge distillation with a lower computation cost during training.
We show that the UNcertainty-aware mIXup (UNIX) can serve as a clean yet effective solution.
arXiv Detail & Related papers (2020-12-17T06:52:16Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.