Related papers: Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

URL: http://arxiv.org/abs/2004.12406v2
Date: Sun, 11 Oct 2020 11:52:08 GMT
Title: Masking as an Efficient Alternative to Finetuning for Pretrained Language Models
Authors: Mengjie Zhao, Tao Lin, Fei Mi, Martin Jaggi, Hinrich Sch\"utze
Abstract summary: We learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. In intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks.
Score: 49.64561153284428
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT and RoBERTa on a series of NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred simultaneously. Through intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.

Related papers

Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on Text [27.320746607958142]
Masked language modeling has become a widely adopted unsupervised technique to pre-train language models. We propose to adjust the masking ratio and to decide which tokens to mask based on a novel task-informed anti-curriculum learning scheme.
arXiv Detail & Related papers (2025-02-18T15:36:16Z)
Instruction-Following Pruning for Large Language Models [58.329978053711024]
We move beyond the traditional static pruning approach of determining a fixed pruning mask for a model. In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction. Our approach, termed "instruction-following pruning", introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task.
arXiv Detail & Related papers (2025-01-03T20:19:14Z)
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training [59.923672191632065]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT) MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies. Our results demonstrate that MaPeT achieves competitive performance on ImageNet.
arXiv Detail & Related papers (2023-06-12T18:12:19Z)
Efficient Masked Autoencoders with Self-Consistency [34.7076436760695]
Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision. We propose efficient masked autoencoders with self-consistency (EMAE) to improve the pre-training efficiency. EMAE consistently obtains state-of-the-art transfer ability on a variety of downstream tasks, such as image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2023-02-28T09:21:12Z)
Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning. We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z)
Meta Mask Correction for Nuclei Segmentation in Histopathological Image [5.36728433027615]
We propose a novel meta-learning-based nuclei segmentation method to leverage data with noisy masks. Specifically, we design a fully conventional meta-model that can correct noisy masks using a small amount of clean meta-data. We show that our method achieves the state-of-the-art result.
arXiv Detail & Related papers (2021-11-24T13:53:35Z)
Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning. We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z)
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training [152.63467944568094]
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks. Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks.
arXiv Detail & Related papers (2020-02-28T15:28:49Z)
Semi-Autoregressive Training Improves Mask-Predict Decoding [119.8412758943192]
We introduce a new training method for conditional masked language models, SMART, which mimics the semi-autoregressive behavior of mask-predict. Models trained with SMART produce higher-quality translations when using mask-predict decoding, effectively closing the remaining performance gap with fully autoregressive models.
arXiv Detail & Related papers (2020-01-23T19:56:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.