Improved Knowledge Distillation for Pre-trained Language Models via
Knowledge Selection
- URL: http://arxiv.org/abs/2302.00444v1
- Date: Wed, 1 Feb 2023 13:40:19 GMT
- Title: Improved Knowledge Distillation for Pre-trained Language Models via
Knowledge Selection
- Authors: Chenglong Wang, Yi Lu, Yongyu Mu, Yimin Hu, Tong Xiao and Jingbo Zhu
- Abstract summary: We propose an actor-critic approach to selecting appropriate knowledge to transfer during the process of knowledge distillation.
Experimental results on the GLUE datasets show that our method outperforms several strong knowledge distillation baselines significantly.
- Score: 35.515135913846386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation addresses the problem of transferring knowledge from a
teacher model to a student model. In this process, we typically have multiple
types of knowledge extracted from the teacher model. The problem is to make
full use of them to train the student model. Our preliminary study shows that:
(1) not all of the knowledge is necessary for learning a good student model,
and (2) knowledge distillation can benefit from certain knowledge at different
training steps. In response to these, we propose an actor-critic approach to
selecting appropriate knowledge to transfer during the process of knowledge
distillation. In addition, we offer a refinement of the training algorithm to
ease the computational burden. Experimental results on the GLUE datasets show
that our method outperforms several strong knowledge distillation baselines
significantly.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.