Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting
- URL: http://arxiv.org/abs/2004.12651v1
- Date: Mon, 27 Apr 2020 08:59:57 GMT
- Title: Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting
- Authors: Sanyuan Chen, Yutai Hou, Yiming Cui, Wanxiang Che, Ting Liu, Xiangzhan
Yu
- Abstract summary: We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
- Score: 66.45372974713189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep pretrained language models have achieved great success in the way of
pretraining first and then fine-tuning. But such a sequential transfer learning
paradigm often confronts the catastrophic forgetting problem and leads to
sub-optimal performance. To fine-tune with less forgetting, we propose a recall
and learn mechanism, which adopts the idea of multi-task learning and jointly
learns pretraining tasks and downstream tasks. Specifically, we propose a
Pretraining Simulation mechanism to recall the knowledge from pretraining tasks
without data, and an Objective Shifting mechanism to focus the learning on
downstream tasks gradually. Experiments show that our method achieves
state-of-the-art performance on the GLUE benchmark. Our method also enables
BERT-base to achieve better performance than directly fine-tuning of
BERT-large. Further, we provide the open-source RecAdam optimizer, which
integrates the proposed mechanisms into Adam optimizer, to facility the NLP
community.
Related papers
- Instruction Pre-Training: Language Models are Supervised Multitask Learners [115.95022434390181]
In this paper, we propose a framework that augments massive raw corpora with instruction-response pairs to pre-train language models (LMs)
In our experiments, we synthesize 200M instruction-response pairs covering 40+ task categories to verify the effectiveness of Instruction Pre-Training.
arXiv Detail & Related papers (2024-06-20T16:55:33Z) - Machine Unlearning of Pre-trained Large Language Models [17.40601262379265]
This study investigates the concept of the right to be forgotten' within the context of large language models (LLMs)
We explore machine unlearning as a pivotal solution, with a focus on pre-trained models.
arXiv Detail & Related papers (2024-02-23T07:43:26Z) - Learning to Modulate pre-trained Models in RL [22.812215561012874]
Fine-tuning a pre-trained model often suffers from catastrophic forgetting.
Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly.
We propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model.
arXiv Detail & Related papers (2023-06-26T17:53:05Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Meta-learning for downstream aware and agnostic pretraining [7.2051162210119495]
We propose using meta-learning to select tasks that provide the most informative learning signals in each episode of pretraining.
We discuss the algorithm of the method and its two variants, downstream-aware and downstream-agnostic pretraining.
arXiv Detail & Related papers (2021-06-06T23:08:09Z) - Incremental Learning for End-to-End Automatic Speech Recognition [41.297106772785206]
We propose an incremental learning method for end-to-end Automatic Speech Recognition (ASR)
We design a novel explainability-based knowledge distillation for ASR models, which is combined with a response-based knowledge distillation to maintain the original model's predictions and the "reason" for the predictions.
Results on a multi-stage sequential training task show that our method outperforms existing ones in mitigating forgetting.
arXiv Detail & Related papers (2020-05-11T08:18:08Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.