Meta Learning for Knowledge Distillation
- URL: http://arxiv.org/abs/2106.04570v1
- Date: Tue, 8 Jun 2021 17:59:03 GMT
- Title: Meta Learning for Knowledge Distillation
- Authors: Wangchunshu Zhou and Canwen Xu and Julian McAuley
- Abstract summary: We show the teacher network can learn to better transfer knowledge to the student network.
We introduce a pilot update mechanism to improve the alignment between the inner-learner and meta-learner.
- Score: 12.716258111815312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Meta Learning for Knowledge Distillation (MetaDistil), a simple
yet effective alternative to traditional knowledge distillation (KD) methods
where the teacher model is fixed during training. We show the teacher network
can learn to better transfer knowledge to the student network (i.e., learning
to teach) with the feedback from the performance of the distilled student
network in a meta learning framework. Moreover, we introduce a pilot update
mechanism to improve the alignment between the inner-learner and meta-learner
in meta learning algorithms that focus on an improved inner-learner.
Experiments on various benchmarks show that MetaDistil can yield significant
improvements compared with traditional KD algorithms and is less sensitive to
the choice of different student capacity and hyperparameters, facilitating the
use of KD on different tasks and models. The code is available at
https://github.com/JetRunner/MetaDistil
Related papers
- On effects of Knowledge Distillation on Transfer Learning [0.0]
We propose a machine learning architecture we call TL+KD that combines knowledge distillation with transfer learning.
We show that using guidance and knowledge from a larger teacher network during fine-tuning, we can improve the student network to achieve better validation performances like accuracy.
arXiv Detail & Related papers (2022-10-18T08:11:52Z) - Knowledge Condensation Distillation [38.446333274732126]
Existing methods focus on excavating the knowledge hints and transferring the whole knowledge to the student.
In this paper, we propose Knowledge Condensation Distillation (KCD)
Our approach is easy to build on top of the off-the-shelf KD methods, with no extra training parameters and negligible overhead.
arXiv Detail & Related papers (2022-07-12T09:17:34Z) - Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z) - Students are the Best Teacher: Exit-Ensemble Distillation with
Multi-Exits [25.140055086630838]
This paper proposes a novel knowledge distillation-based learning method to improve the classification performance of convolutional neural networks (CNNs)
Unlike the conventional notion of distillation where teachers only teach students, we show that students can also help other students and even the teacher to learn better.
arXiv Detail & Related papers (2021-04-01T07:10:36Z) - Meta-KD: A Meta Knowledge Distillation Framework for Language Model
Compression across Domains [31.66937407833244]
We propose a Meta-Knowledge Distillation (Meta-KD) framework to build a meta-teacher model that captures transferable knowledge across domains.
Specifically, we first leverage a cross-domain learning process to train the meta-teacher on multiple domains, and then propose a meta-distillation algorithm to learn single-domain student models with guidance from the meta-teacher.
arXiv Detail & Related papers (2020-12-02T15:18:37Z) - Online Structured Meta-learning [137.48138166279313]
Current online meta-learning algorithms are limited to learn a globally-shared meta-learner.
We propose an online structured meta-learning (OSML) framework to overcome this limitation.
Experiments on three datasets demonstrate the effectiveness and interpretability of our proposed framework.
arXiv Detail & Related papers (2020-10-22T09:10:31Z) - Meta-learning the Learning Trends Shared Across Tasks [123.10294801296926]
Gradient-based meta-learning algorithms excel at quick adaptation to new tasks with limited data.
Existing meta-learning approaches only depend on the current task information during the adaptation.
We propose a 'Path-aware' model-agnostic meta-learning approach.
arXiv Detail & Related papers (2020-10-19T08:06:47Z) - Rethinking Few-Shot Image Classification: a Good Embedding Is All You
Need? [72.00712736992618]
We show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, outperforms state-of-the-art few-shot learning methods.
An additional boost can be achieved through the use of self-distillation.
We believe that our findings motivate a rethinking of few-shot image classification benchmarks and the associated role of meta-learning algorithms.
arXiv Detail & Related papers (2020-03-25T17:58:42Z) - Unraveling Meta-Learning: Understanding Feature Representations for
Few-Shot Tasks [55.66438591090072]
We develop a better understanding of the underlying mechanics of meta-learning and the difference between models trained using meta-learning and models trained classically.
We develop a regularizer which boosts the performance of standard training routines for few-shot classification.
arXiv Detail & Related papers (2020-02-17T03:18:45Z) - Revisiting Meta-Learning as Supervised Learning [69.2067288158133]
We aim to provide a principled, unifying framework by revisiting and strengthening the connection between meta-learning and traditional supervised learning.
By treating pairs of task-specific data sets and target models as (feature, label) samples, we can reduce many meta-learning algorithms to instances of supervised learning.
This view not only unifies meta-learning into an intuitive and practical framework but also allows us to transfer insights from supervised learning directly to improve meta-learning.
arXiv Detail & Related papers (2020-02-03T06:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.