CES-KD: Curriculum-based Expert Selection for Guided Knowledge
Distillation
- URL: http://arxiv.org/abs/2209.07606v1
- Date: Thu, 15 Sep 2022 21:02:57 GMT
- Title: CES-KD: Curriculum-based Expert Selection for Guided Knowledge
Distillation
- Authors: Ibtihel Amara, Maryam Ziaeefard, Brett H. Meyer, Warren Gross and
James J. Clark
- Abstract summary: This paper proposes a new technique called Curriculum Expert Selection for Knowledge Distillation (CES-KD)
CES-KD is built upon the hypothesis that a student network should be guided gradually using stratified teaching curriculum.
Specifically, our method is a gradual TA-based KD technique that selects a single teacher per input image based on a curriculum driven by the difficulty in classifying the image.
- Score: 4.182345120164705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation (KD) is an effective tool for compressing deep
classification models for edge devices. However, the performance of KD is
affected by the large capacity gap between the teacher and student networks.
Recent methods have resorted to a multiple teacher assistant (TA) setting for
KD, which sequentially decreases the size of the teacher model to relatively
bridge the size gap between these models. This paper proposes a new technique
called Curriculum Expert Selection for Knowledge Distillation (CES-KD) to
efficiently enhance the learning of a compact student under the capacity gap
problem. This technique is built upon the hypothesis that a student network
should be guided gradually using stratified teaching curriculum as it learns
easy (hard) data samples better and faster from a lower (higher) capacity
teacher network. Specifically, our method is a gradual TA-based KD technique
that selects a single teacher per input image based on a curriculum driven by
the difficulty in classifying the image. In this work, we empirically verify
our hypothesis and rigorously experiment with CIFAR-10, CIFAR-100, CINIC-10,
and ImageNet datasets and show improved accuracy on VGG-like models, ResNets,
and WideResNets architectures.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.