Meta-KD: A Meta Knowledge Distillation Framework for Language Model
Compression across Domains
- URL: http://arxiv.org/abs/2012.01266v1
- Date: Wed, 2 Dec 2020 15:18:37 GMT
- Title: Meta-KD: A Meta Knowledge Distillation Framework for Language Model
Compression across Domains
- Authors: Haojie Pan, Chengyu Wang, Minghui Qiu, Yichang Zhang, Yaliang Li, Jun
Huang
- Abstract summary: We propose a Meta-Knowledge Distillation (Meta-KD) framework to build a meta-teacher model that captures transferable knowledge across domains.
Specifically, we first leverage a cross-domain learning process to train the meta-teacher on multiple domains, and then propose a meta-distillation algorithm to learn single-domain student models with guidance from the meta-teacher.
- Score: 31.66937407833244
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Pre-trained language models have been applied to various NLP tasks with
considerable performance gains. However, the large model sizes, together with
the long inference time, limit the deployment of such models in real-time
applications. Typical approaches consider knowledge distillation to distill
large teacher models into small student models. However, most of these studies
focus on single-domain only, which ignores the transferable knowledge from
other domains. We argue that training a teacher with transferable knowledge
digested across domains can achieve better generalization capability to help
knowledge distillation. To this end, we propose a Meta-Knowledge Distillation
(Meta-KD) framework to build a meta-teacher model that captures transferable
knowledge across domains inspired by meta-learning and use it to pass knowledge
to students. Specifically, we first leverage a cross-domain learning process to
train the meta-teacher on multiple domains, and then propose a
meta-distillation algorithm to learn single-domain student models with guidance
from the meta-teacher. Experiments on two public multi-domain NLP tasks show
the effectiveness and superiority of the proposed Meta-KD framework. We also
demonstrate the capability of Meta-KD in both few-shot and zero-shot learning
settings.
Related papers
- Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification.
We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z) - Meta Learning to Bridge Vision and Language Models for Multimodal
Few-Shot Learning [38.37682598345653]
We introduce a multimodal meta-learning approach to bridge the gap between vision and language models.
We define a meta-mapper network, acting as a meta-learner, to efficiently bridge frozen large-scale vision and language models.
We evaluate our approach on recently proposed multimodal few-shot benchmarks, measuring how rapidly the model can bind novel visual concepts to words.
arXiv Detail & Related papers (2023-02-28T17:46:18Z) - HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain
Language Model Compression [53.90578309960526]
Large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods.
We propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information.
arXiv Detail & Related papers (2021-10-16T11:23:02Z) - Improving the Generalization of Meta-learning on Unseen Domains via
Adversarial Shift [3.1219977244201056]
We propose a model-agnostic shift layer to learn how to simulate the domain shift and generate pseudo tasks.
Based on the pseudo tasks, the meta-learning model can learn cross-domain meta-knowledge, which can generalize well on unseen domains.
arXiv Detail & Related papers (2021-07-23T07:29:30Z) - Revisiting Knowledge Distillation: An Inheritance and Exploration
Framework [153.73692961660964]
Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model to a student model.
We propose a novel inheritance and exploration knowledge distillation framework (IE-KD)
Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks.
arXiv Detail & Related papers (2021-07-01T02:20:56Z) - Meta Learning for Knowledge Distillation [12.716258111815312]
We show the teacher network can learn to better transfer knowledge to the student network.
We introduce a pilot update mechanism to improve the alignment between the inner-learner and meta-learner.
arXiv Detail & Related papers (2021-06-08T17:59:03Z) - Learning to Generalize Unseen Domains via Memory-based Multi-Source
Meta-Learning for Person Re-Identification [59.326456778057384]
We propose the Memory-based Multi-Source Meta-Learning framework to train a generalizable model for unseen domains.
We also present a meta batch normalization layer (MetaBN) to diversify meta-test features.
Experiments demonstrate that our M$3$L can effectively enhance the generalization ability of the model for unseen domains.
arXiv Detail & Related papers (2020-12-01T11:38:16Z) - Revisiting Meta-Learning as Supervised Learning [69.2067288158133]
We aim to provide a principled, unifying framework by revisiting and strengthening the connection between meta-learning and traditional supervised learning.
By treating pairs of task-specific data sets and target models as (feature, label) samples, we can reduce many meta-learning algorithms to instances of supervised learning.
This view not only unifies meta-learning into an intuitive and practical framework but also allows us to transfer insights from supervised learning directly to improve meta-learning.
arXiv Detail & Related papers (2020-02-03T06:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.