Masking as an Efficient Alternative to Finetuning for Pretrained
Language Models
- URL: http://arxiv.org/abs/2004.12406v2
- Date: Sun, 11 Oct 2020 11:52:08 GMT
- Title: Masking as an Efficient Alternative to Finetuning for Pretrained
Language Models
- Authors: Mengjie Zhao, Tao Lin, Fei Mi, Martin Jaggi, Hinrich Sch\"utze
- Abstract summary: We learn selective binary masks for pretrained weights in lieu of modifying them through finetuning.
In intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks.
- Score: 49.64561153284428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present an efficient method of utilizing pretrained language models, where
we learn selective binary masks for pretrained weights in lieu of modifying
them through finetuning. Extensive evaluations of masking BERT and RoBERTa on a
series of NLP tasks show that our masking scheme yields performance comparable
to finetuning, yet has a much smaller memory footprint when several tasks need
to be inferred simultaneously. Through intrinsic evaluations, we show that
representations computed by masked language models encode information necessary
for solving downstream tasks. Analyzing the loss landscape, we show that
masking and finetuning produce models that reside in minima that can be
connected by a line segment with nearly constant test accuracy. This confirms
that masking can be utilized as an efficient alternative to finetuning.
Related papers
- Learning to Mask and Permute Visual Tokens for Vision Transformer
Pre-Training [59.923672191632065]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT)
MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies.
Our results demonstrate that MaPeT achieves competitive performance on ImageNet.
arXiv Detail & Related papers (2023-06-12T18:12:19Z) - Efficient Masked Autoencoders with Self-Consistency [34.7076436760695]
Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision.
We propose efficient masked autoencoders with self-consistency (EMAE) to improve the pre-training efficiency.
EMAE consistently obtains state-of-the-art transfer ability on a variety of downstream tasks, such as image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2023-02-28T09:21:12Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - Meta Mask Correction for Nuclei Segmentation in Histopathological Image [5.36728433027615]
We propose a novel meta-learning-based nuclei segmentation method to leverage data with noisy masks.
Specifically, we design a fully conventional meta-model that can correct noisy masks using a small amount of clean meta-data.
We show that our method achieves the state-of-the-art result.
arXiv Detail & Related papers (2021-11-24T13:53:35Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z) - UniLMv2: Pseudo-Masked Language Models for Unified Language Model
Pre-Training [152.63467944568094]
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks.
Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks.
arXiv Detail & Related papers (2020-02-28T15:28:49Z) - Semi-Autoregressive Training Improves Mask-Predict Decoding [119.8412758943192]
We introduce a new training method for conditional masked language models, SMART, which mimics the semi-autoregressive behavior of mask-predict.
Models trained with SMART produce higher-quality translations when using mask-predict decoding, effectively closing the remaining performance gap with fully autoregressive models.
arXiv Detail & Related papers (2020-01-23T19:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.