Train No Evil: Selective Masking for Task-Guided Pre-Training
- URL: http://arxiv.org/abs/2004.09733v2
- Date: Wed, 7 Oct 2020 09:47:41 GMT
- Title: Train No Evil: Selective Masking for Task-Guided Pre-Training
- Authors: Yuxian Gu, Zhengyan Zhang, Xiaozhi Wang, Zhiyuan Liu, Maosong Sun
- Abstract summary: We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
- Score: 97.03615486457065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, pre-trained language models mostly follow the
pre-train-then-fine-tuning paradigm and have achieved great performance on
various downstream tasks. However, since the pre-training stage is typically
task-agnostic and the fine-tuning stage usually suffers from insufficient
supervised data, the models cannot always well capture the domain-specific and
task-specific patterns. In this paper, we propose a three-stage framework by
adding a task-guided pre-training stage with selective masking between general
pre-training and fine-tuning. In this stage, the model is trained by masked
language modeling on in-domain unsupervised data to learn domain-specific
patterns and we propose a novel selective masking strategy to learn
task-specific patterns. Specifically, we design a method to measure the
importance of each token in sequences and selectively mask the important
tokens. Experimental results on two sentiment analysis tasks show that our
method can achieve comparable or even better performance with less than 50% of
computation cost, which indicates our method is both effective and efficient.
The source code of this paper can be obtained from
https://github.com/thunlp/SelectiveMasking.
Related papers
- Hybrid diffusion models: combining supervised and generative pretraining for label-efficient fine-tuning of segmentation models [55.2480439325792]
We propose a new pretext task, which is to perform simultaneously image denoising and mask prediction on the first domain.
We show that fine-tuning a model pretrained using this approach leads to better results than fine-tuning a similar model trained using either supervised or unsupervised pretraining.
arXiv Detail & Related papers (2024-08-06T20:19:06Z) - Exploring Transferability for Randomized Smoothing [37.60675615521106]
We propose a method for pretraining certifiably robust models.
We find that surprisingly strong certified accuracy can be achieved even when finetuning on only clean images.
arXiv Detail & Related papers (2023-12-14T15:08:27Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Masking as an Efficient Alternative to Finetuning for Pretrained
Language Models [49.64561153284428]
We learn selective binary masks for pretrained weights in lieu of modifying them through finetuning.
In intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks.
arXiv Detail & Related papers (2020-04-26T15:03:47Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.