Multi-Teacher Knowledge Distillation for Incremental Implicitly-Refined
Classification
- URL: http://arxiv.org/abs/2202.11384v2
- Date: Thu, 24 Feb 2022 07:10:56 GMT
- Title: Multi-Teacher Knowledge Distillation for Incremental Implicitly-Refined
Classification
- Authors: Longhui Yu, Zhenyu Weng, Yuqing Wang, Yuesheng Zhu
- Abstract summary: We propose a novel Multi-Teacher Knowledge Distillation (MTKD) strategy for incremental learning.
To preserve the superclass knowledge, we use the initial model as a superclass teacher to distill the superclass knowledge for the student model.
We propose a post-processing mechanism, called as Top-k prediction restriction to reduce the redundant predictions.
- Score: 37.14755431285735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Incremental learning methods can learn new classes continually by distilling
knowledge from the last model (as a teacher model) to the current model (as a
student model) in the sequentially learning process. However, these methods
cannot work for Incremental Implicitly-Refined Classification (IIRC), an
incremental learning extension where the incoming classes could have two
granularity levels, a superclass label and a subclass label. This is because
the previously learned superclass knowledge may be occupied by the subclass
knowledge learned sequentially. To solve this problem, we propose a novel
Multi-Teacher Knowledge Distillation (MTKD) strategy. To preserve the subclass
knowledge, we use the last model as a general teacher to distill the previous
knowledge for the student model. To preserve the superclass knowledge, we use
the initial model as a superclass teacher to distill the superclass knowledge
as the initial model contains abundant superclass knowledge. However,
distilling knowledge from two teacher models could result in the student model
making some redundant predictions. We further propose a post-processing
mechanism, called as Top-k prediction restriction to reduce the redundant
predictions. Our experimental results on IIRC-ImageNet120 and IIRC-CIFAR100
show that the proposed method can achieve better classification accuracy
compared with existing state-of-the-art methods.
Related papers
- Towards Non-Exemplar Semi-Supervised Class-Incremental Learning [33.560003528712414]
Class-incremental learning aims to gradually recognize new classes while maintaining the discriminability of old ones.
We propose a non-exemplar semi-supervised CIL framework with contrastive learning and semi-supervised incremental prototype classifier (Semi-IPC)
Semi-IPC learns a prototype for each class with unsupervised regularization, enabling the model to incrementally learn from partially labeled new data.
arXiv Detail & Related papers (2024-03-27T06:28:19Z) - AD-KD: Attribution-Driven Knowledge Distillation for Language Model
Compression [26.474962405945316]
We present a novel attribution-driven knowledge distillation approach to compress pre-trained language models.
To enhance the knowledge transfer of model reasoning and generalization, we explore multi-view attribution distillation on all potential decisions of the teacher.
arXiv Detail & Related papers (2023-05-17T07:40:12Z) - Self-distilled Knowledge Delegator for Exemplar-free Class Incremental
Learning [39.69318045176051]
We exploit the knowledge encoded in a previously trained classification model to handle the catastrophic forgetting problem in continual learning.
Specifically, we introduce a so-called knowledge delegator, which is capable of transferring knowledge from the trained model to a randomly re-d new model by generating informative samples.
This simple incremental learning framework surpasses existing exemplar-free methods by a large margin on four widely used class incremental benchmarks.
arXiv Detail & Related papers (2022-05-23T06:31:13Z) - Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student.
At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL)
Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z) - Discriminative Distillation to Reduce Class Confusion in Continual
Learning [57.715862676788156]
Class confusion may play a role in downgrading the classification performance during continual learning.
We propose a discriminative distillation strategy to help the classify well learn the discriminative features between confusing classes.
arXiv Detail & Related papers (2021-08-11T12:46:43Z) - Dual-Teacher Class-Incremental Learning With Data-Free Generative Replay [49.691610143011566]
We propose two novel knowledge transfer techniques for class-incremental learning (CIL)
First, we propose data-free generative replay (DF-GR) to mitigate catastrophic forgetting in CIL by using synthetic samples from a generative model.
Second, we introduce dual-teacher information distillation (DT-ID) for knowledge distillation from two teachers to one student.
arXiv Detail & Related papers (2021-06-17T22:13:15Z) - Learning Adaptive Embedding Considering Incremental Class [55.21855842960139]
Class-Incremental Learning (CIL) aims to train a reliable model with the streaming data, which emerges unknown classes sequentially.
Different from traditional closed set learning, CIL has two main challenges: 1) Novel class detection.
After the novel classes are detected, the model needs to be updated without re-training using entire previous data.
arXiv Detail & Related papers (2020-08-31T04:11:24Z) - Towards Class-incremental Object Detection with Nearest Mean of
Exemplars [5.546052390414686]
Incremental learning can modify the parameters and structure of the deep learning model so that the model does not forget the old knowledge while learning new knowledge.
This paper proposes a kind of incremental method, which adjusts the parameters of the model by identifying the prototype vector and increasing the distance of the vector.
arXiv Detail & Related papers (2020-08-19T08:56:04Z) - Subclass Distillation [94.18870689772544]
We show that it is possible to transfer most of the generalization ability of a teacher to a student.
For datasets where there are known, natural subclasses we demonstrate that the teacher learns similar subclasses.
For clickthrough datasets where the subclasses are unknown we demonstrate that subclass distillation allows the student to learn faster and better.
arXiv Detail & Related papers (2020-02-10T16:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.