Related papers: Train No Evil: Selective Masking for Task-Guided Pre-Training

Train No Evil: Selective Masking for Task-Guided Pre-Training

URL: http://arxiv.org/abs/2004.09733v2
Date: Wed, 7 Oct 2020 09:47:41 GMT
Title: Train No Evil: Selective Masking for Task-Guided Pre-Training
Authors: Yuxian Gu, Zhengyan Zhang, Xiaozhi Wang, Zhiyuan Liu, Maosong Sun
Abstract summary: We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning. We show that our method can achieve comparable or even better performance with less than 50% of cost.
Score: 97.03615486457065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, pre-trained language models mostly follow the pre-train-then-fine-tuning paradigm and have achieved great performance on various downstream tasks. However, since the pre-training stage is typically task-agnostic and the fine-tuning stage usually suffers from insufficient supervised data, the models cannot always well capture the domain-specific and task-specific patterns. In this paper, we propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning. In this stage, the model is trained by masked language modeling on in-domain unsupervised data to learn domain-specific patterns and we propose a novel selective masking strategy to learn task-specific patterns. Specifically, we design a method to measure the importance of each token in sequences and selectively mask the important tokens. Experimental results on two sentiment analysis tasks show that our method can achieve comparable or even better performance with less than 50% of computation cost, which indicates our method is both effective and efficient. The source code of this paper can be obtained from https://github.com/thunlp/SelectiveMasking.

Related papers

Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on Text [27.320746607958142]
Masked language modeling has become a widely adopted unsupervised technique to pre-train language models. We propose to adjust the masking ratio and to decide which tokens to mask based on a novel task-informed anti-curriculum learning scheme.
arXiv Detail & Related papers (2025-02-18T15:36:16Z)
Hybrid diffusion models: combining supervised and generative pretraining for label-efficient fine-tuning of segmentation models [55.2480439325792]
We propose a new pretext task, which is to perform simultaneously image denoising and mask prediction on the first domain. We show that fine-tuning a model pretrained using this approach leads to better results than fine-tuning a similar model trained using either supervised or unsupervised pretraining.
arXiv Detail & Related papers (2024-08-06T20:19:06Z)
Exploring Transferability for Randomized Smoothing [37.60675615521106]
We propose a method for pretraining certifiably robust models. We find that surprisingly strong certified accuracy can be achieved even when finetuning on only clean images.
arXiv Detail & Related papers (2023-12-14T15:08:27Z)
Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage. We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z)
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models [49.64561153284428]
We learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. In intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks.
arXiv Detail & Related papers (2020-04-26T15:03:47Z)
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks. A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains. Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z)
Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.