Knowledge Distillation via Instance-level Sequence Learning
- URL: http://arxiv.org/abs/2106.10885v1
- Date: Mon, 21 Jun 2021 06:58:26 GMT
- Title: Knowledge Distillation via Instance-level Sequence Learning
- Authors: Haoran Zhao, Xin Sun, Junyu Dong, Zihe Dong and Qiong Li
- Abstract summary: We provide a curriculum learning knowledge distillation framework via instance-level sequence learning.
It employs the student network of the early epoch as a snapshot to create a curriculum for the student network's next training phase.
Compared with several state-of-the-art methods, our framework achieves the best performance with fewer iterations.
- Score: 25.411142312584698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, distillation approaches are suggested to extract general knowledge
from a teacher network to guide a student network. Most of the existing methods
transfer knowledge from the teacher network to the student via feeding the
sequence of random mini-batches sampled uniformly from the data. Instead, we
argue that the compact student network should be guided gradually using samples
ordered in a meaningful sequence. Thus, it can bridge the gap of feature
representation between the teacher and student network step by step. In this
work, we provide a curriculum learning knowledge distillation framework via
instance-level sequence learning. It employs the student network of the early
epoch as a snapshot to create a curriculum for the student network's next
training phase. We carry out extensive experiments on CIFAR-10, CIFAR-100, SVHN
and CINIC-10 datasets. Compared with several state-of-the-art methods, our
framework achieves the best performance with fewer iterations.
Related papers
- Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$)
We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network.
We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z) - Learning to Retain while Acquiring: Combating Distribution-Shift in
Adversarial Data-Free Knowledge Distillation [31.294947552032088]
Data-free Knowledge Distillation (DFKD) has gained popularity recently, with the fundamental idea of carrying out knowledge transfer from a Teacher to a Student neural network in the absence of training data.
We propose a meta-learning inspired framework by treating the task of Knowledge-Acquisition (learning from newly generated samples) and Knowledge-Retention (retaining knowledge on previously met samples) as meta-train and meta-test.
arXiv Detail & Related papers (2023-02-28T03:50:56Z) - Teaching What You Should Teach: A Data-Based Distillation Method [20.595460553747163]
We introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework.
We propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally.
To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses.
arXiv Detail & Related papers (2022-12-11T06:22:14Z) - Hypernetworks for Continual Semi-Supervised Learning [37.109190308781244]
We propose a framework for semi-supervised continual learning called Meta-Consolidation for Continual Semi-Supervised Learning (MCSSL)
Our framework has a hypernetwork that learns the meta-distribution that generates the weights of a semi-supervised auxiliary classifier generative adversarial network $(textitSemi-ACGAN)$ as the base network.
We present $textitSemi-Split CIFAR-10$, a new benchmark for continual semi-supervised learning, obtained by modifying the $textitSplit CIFAR-10$ dataset.
arXiv Detail & Related papers (2021-10-05T07:42:38Z) - Students are the Best Teacher: Exit-Ensemble Distillation with
Multi-Exits [25.140055086630838]
This paper proposes a novel knowledge distillation-based learning method to improve the classification performance of convolutional neural networks (CNNs)
Unlike the conventional notion of distillation where teachers only teach students, we show that students can also help other students and even the teacher to learn better.
arXiv Detail & Related papers (2021-04-01T07:10:36Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - Incremental Embedding Learning via Zero-Shot Translation [65.94349068508863]
Current state-of-the-art incremental learning methods tackle catastrophic forgetting problem in traditional classification networks.
We propose a novel class-incremental method for embedding network, named as zero-shot translation class-incremental method (ZSTCI)
In addition, ZSTCI can easily be combined with existing regularization-based incremental learning methods to further improve performance of embedding networks.
arXiv Detail & Related papers (2020-12-31T08:21:37Z) - Progressive Network Grafting for Few-Shot Knowledge Distillation [60.38608462158474]
We introduce a principled dual-stage distillation scheme tailored for few-shot data.
In the first step, we graft the student blocks one by one onto the teacher, and learn the parameters of the grafted block intertwined with those of the other teacher blocks.
Experiments demonstrate that our approach, with only a few unlabeled samples, achieves gratifying results on CIFAR10, CIFAR100, and ILSVRC-2012.
arXiv Detail & Related papers (2020-12-09T08:34:36Z) - Efficient Crowd Counting via Structured Knowledge Transfer [122.30417437707759]
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications.
We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network.
Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-03-23T08:05:41Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.