Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
- URL: http://arxiv.org/abs/2004.10964v3
- Date: Tue, 5 May 2020 22:00:44 GMT
- Title: Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
- Authors: Suchin Gururangan, Ana Marasovi\'c, Swabha Swayamdipta, Kyle Lo, Iz
Beltagy, Doug Downey, Noah A. Smith
- Abstract summary: We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
- Score: 81.99843216550306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models pretrained on text from a wide variety of sources form the
foundation of today's NLP. In light of the success of these broad-coverage
models, we investigate whether it is still helpful to tailor a pretrained model
to the domain of a target task. We present a study across four domains
(biomedical and computer science publications, news, and reviews) and eight
classification tasks, showing that a second phase of pretraining in-domain
(domain-adaptive pretraining) leads to performance gains, under both high- and
low-resource settings. Moreover, adapting to the task's unlabeled data
(task-adaptive pretraining) improves performance even after domain-adaptive
pretraining. Finally, we show that adapting to a task corpus augmented using
simple data selection strategies is an effective alternative, especially when
resources for domain-adaptive pretraining might be unavailable. Overall, we
consistently find that multi-phase adaptive pretraining offers large gains in
task performance.
Related papers
- Towards Simple and Efficient Task-Adaptive Pre-training for Text
Classification [0.7874708385247353]
We study the impact of training only the embedding layer on the model's performance during TAPT and task-specific finetuning.
We show that training only the BERT embedding layer during TAPT is sufficient to adapt to the vocabulary of the target domain and achieve comparable performance.
arXiv Detail & Related papers (2022-09-26T18:29:12Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Efficient Domain Adaptation of Language Models via Adaptive Tokenization [5.058301279065432]
We show that domain-specific subword sequences can be efficiently determined directly from divergences in the conditional token distributions of the base and domain-specific corpora.
Our approach produces smaller models and less training and inference time than other approaches using tokenizer augmentation.
arXiv Detail & Related papers (2021-09-15T17:51:27Z) - Should We Be Pre-training? An Argument for End-task Aware Training as an
Alternative [88.11465517304515]
In general, the pre-training step relies on little to no direct knowledge of the task on which the model will be fine-tuned.
We show that multi-tasking the end-task and auxiliary objectives results in significantly better downstream task performance.
arXiv Detail & Related papers (2021-09-15T17:13:18Z) - Back-Translated Task Adaptive Pretraining: Improving Accuracy and
Robustness on Text Classification [5.420446976940825]
We propose a back-translated task-adaptive pretraining (BT-TAPT) method that increases the amount of task-specific data for LM re-pretraining.
The experimental results show that the proposed BT-TAPT yields improved classification accuracy on both low- and high-resource data and better robustness to noise than the conventional adaptive pretraining method.
arXiv Detail & Related papers (2021-07-22T06:27:35Z) - Adapt-and-Distill: Developing Small, Fast and Effective Pretrained
Language Models for Domains [45.07506437436464]
We present a general approach to developing small, fast and effective pre-trained models for specific domains.
This is achieved by adapting the off-the-shelf general pre-trained models and performing task-agnostic knowledge distillation in target domains.
arXiv Detail & Related papers (2021-06-25T07:37:05Z) - CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
Representation Learning [49.18591896085498]
We propose CUPID to bridge the domain gap between source and target data.
CUPID yields new state-of-the-art performance across multiple video-language and video tasks.
arXiv Detail & Related papers (2021-04-01T06:42:16Z) - AdaptSum: Towards Low-Resource Domain Adaptation for Abstractive
Summarization [43.024669990477214]
We present a study of domain adaptation for the abstractive summarization task across six diverse target domains in a low-resource setting.
Experiments show that the effectiveness of pre-training is correlated with the similarity between the pre-training data and the target domain task.
arXiv Detail & Related papers (2021-03-21T08:12:19Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.