Few-Shot Learning of Compact Models via Task-Specific Meta Distillation
- URL: http://arxiv.org/abs/2210.09922v1
- Date: Tue, 18 Oct 2022 15:06:47 GMT
- Title: Few-Shot Learning of Compact Models via Task-Specific Meta Distillation
- Authors: Yong Wu, Shekhor Chanda, Mehrdad Hosseinzadeh, Zhi Liu, Yang Wang
- Abstract summary: We consider a new problem of few-shot learning of compact models.
We propose task-specific meta distillation that simultaneously learns two models in meta-learning.
- Score: 16.683801607142257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a new problem of few-shot learning of compact models.
Meta-learning is a popular approach for few-shot learning. Previous work in
meta-learning typically assumes that the model architecture during
meta-training is the same as the model architecture used for final deployment.
In this paper, we challenge this basic assumption. For final deployment, we
often need the model to be small. But small models usually do not have enough
capacity to effectively adapt to new tasks. In the mean time, we often have
access to the large dataset and extensive computing power during meta-training
since meta-training is typically performed on a server. In this paper, we
propose task-specific meta distillation that simultaneously learns two models
in meta-learning: a large teacher model and a small student model. These two
models are jointly learned during meta-training. Given a new task during
meta-testing, the teacher model is first adapted to this task, then the adapted
teacher model is used to guide the adaptation of the student model. The adapted
student model is used for final deployment. We demonstrate the effectiveness of
our approach in few-shot image classification using model-agnostic
meta-learning (MAML). Our proposed method outperforms other alternatives on
several benchmark datasets.
Related papers
- Meta-Learning with Self-Improving Momentum Target [72.98879709228981]
We propose Self-improving Momentum Target (SiMT) to improve the performance of a meta-learner.
SiMT generates the target model by adapting from the temporal ensemble of the meta-learner.
We show that SiMT brings a significant performance gain when combined with a wide range of meta-learning methods.
arXiv Detail & Related papers (2022-10-11T06:45:15Z) - Dynamic Kernel Selection for Improved Generalization and Memory
Efficiency in Meta-learning [9.176056742068813]
We present MetaDOCK, a task-specific dynamic kernel selection strategy for designing compressed CNN models.
Our method is based on the hypothesis that for a given set of similar tasks, not all kernels of the network are needed by each individual task.
We show that for the same inference budget, pruned versions of large CNN models obtained using our approach consistently outperform the conventional choices of CNN models.
arXiv Detail & Related papers (2022-06-03T17:09:26Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - MetaICL: Learning to Learn In Context [87.23056864536613]
We introduce MetaICL, a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learn-ing on a large set of training tasks.
We show that MetaICL approaches (and sometimes beats) the performance of models fully finetuned on the target task training data, and outperforms much bigger models with nearly 8x parameters.
arXiv Detail & Related papers (2021-10-29T17:42:08Z) - Meta-Regularization by Enforcing Mutual-Exclusiveness [0.8057006406834467]
We propose a regularization technique for meta-learning models that gives the model designer more control over the information flow during meta-training.
Our proposed regularization function shows an accuracy boost of $sim$ $36%$ on the Omniglot dataset.
arXiv Detail & Related papers (2021-01-24T22:57:19Z) - BI-MAML: Balanced Incremental Approach for Meta Learning [9.245355087256314]
We present a novel Balanced Incremental Model Agnostic Meta Learning system (BI-MAML) for learning multiple tasks.
Our method implements a meta-update rule to incrementally adapt its model to new tasks without forgetting old tasks.
Our system performs the meta-updates with only a few-shots and can successfully accomplish them.
arXiv Detail & Related papers (2020-06-12T18:28:48Z) - Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning [79.25478727351604]
We explore a simple process: meta-learning over a whole-classification pre-trained model on its evaluation metric.
We observe this simple method achieves competitive performance to state-of-the-art methods on standard benchmarks.
arXiv Detail & Related papers (2020-03-09T20:06:36Z) - Unraveling Meta-Learning: Understanding Feature Representations for
Few-Shot Tasks [55.66438591090072]
We develop a better understanding of the underlying mechanics of meta-learning and the difference between models trained using meta-learning and models trained classically.
We develop a regularizer which boosts the performance of standard training routines for few-shot classification.
arXiv Detail & Related papers (2020-02-17T03:18:45Z) - Incremental Meta-Learning via Indirect Discriminant Alignment [118.61152684795178]
We develop a notion of incremental learning during the meta-training phase of meta-learning.
Our approach performs favorably at test time as compared to training a model with the full meta-training set.
arXiv Detail & Related papers (2020-02-11T01:39:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.