InforMask: Unsupervised Informative Masking for Language Model
Pretraining
- URL: http://arxiv.org/abs/2210.11771v1
- Date: Fri, 21 Oct 2022 07:10:56 GMT
- Title: InforMask: Unsupervised Informative Masking for Language Model
Pretraining
- Authors: Nafis Sadeq, Canwen Xu, Julian McAuley
- Abstract summary: We propose a new unsupervised masking strategy for training masked language models.
InforMask exploits Pointwise Mutual Information (PMI) to select the most informative tokens to mask.
- Score: 13.177839395411858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked language modeling is widely used for pretraining large language models
for natural language understanding (NLU). However, random masking is
suboptimal, allocating an equal masking rate for all tokens. In this paper, we
propose InforMask, a new unsupervised masking strategy for training masked
language models. InforMask exploits Pointwise Mutual Information (PMI) to
select the most informative tokens to mask. We further propose two
optimizations for InforMask to improve its efficiency. With a one-off
preprocessing step, InforMask outperforms random masking and previously
proposed masking strategies on the factual recall benchmark LAMA and the
question answering benchmark SQuAD v1 and v2.
Related papers
- Downstream Task Guided Masking Learning in Masked Autoencoders Using
Multi-Level Optimization [42.82742477950748]
Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning.
We introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that learns an optimal masking strategy during pretraining.
Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning.
arXiv Detail & Related papers (2024-02-28T07:37:26Z) - Self-Evolution Learning for Discriminative Language Model Pretraining [103.57103957631067]
Self-Evolution learning (SE) is a simple and effective token masking and learning method.
SE focuses on learning the informative yet under-explored tokens and adaptively regularizes the training by introducing a novel Token-specific Label Smoothing approach.
arXiv Detail & Related papers (2023-05-24T16:00:54Z) - Difference-Masking: Choosing What to Mask in Continued Pretraining [56.76782116221438]
We introduce Difference-Masking, a masking strategy that automatically chooses what to mask during continued pretraining.
We find that Difference-Masking outperforms baselines on continued pretraining settings across four diverse language-only and multimodal video tasks.
arXiv Detail & Related papers (2023-05-23T23:31:02Z) - DynaMask: Dynamic Mask Selection for Instance Segmentation [21.50329070835023]
We develop a Mask Switch Module (MSM) with negligible computational cost to select the most suitable mask resolution for each instance.
The proposed method, namely DynaMask, brings consistent and noticeable performance improvements over other state-of-the-arts at a moderate computation overhead.
arXiv Detail & Related papers (2023-03-14T13:01:25Z) - Uniform Masking Prevails in Vision-Language Pretraining [26.513450527203453]
Masked Language Modeling (MLM) has proven to be an essential component of Vision-Language (VL) pretraining.
This paper shows that increasing the masking rate leads to gains in Image-Text Matching (ITM) tasks.
arXiv Detail & Related papers (2022-12-10T04:02:19Z) - Should You Mask 15% in Masked Language Modeling? [86.91486000124156]
Masked language models conventionally use a masking rate of 15%.
We find that masking up to 40% of input tokens can outperform the 15% baseline.
arXiv Detail & Related papers (2022-02-16T11:42:34Z) - Neural Mask Generator: Learning to Generate Adaptive Word Maskings for
Language Model Adaptation [63.195935452646815]
We propose a method to automatically generate a domain- and task-adaptive maskings of the given text for self-supervised pre-training.
We present a novel reinforcement learning-based framework which learns the masking policy.
We validate our Neural Mask Generator (NMG) on several question answering and text classification datasets.
arXiv Detail & Related papers (2020-10-06T13:27:01Z) - UniLMv2: Pseudo-Masked Language Models for Unified Language Model
Pre-Training [152.63467944568094]
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks.
Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks.
arXiv Detail & Related papers (2020-02-28T15:28:49Z) - Semi-Autoregressive Training Improves Mask-Predict Decoding [119.8412758943192]
We introduce a new training method for conditional masked language models, SMART, which mimics the semi-autoregressive behavior of mask-predict.
Models trained with SMART produce higher-quality translations when using mask-predict decoding, effectively closing the remaining performance gap with fully autoregressive models.
arXiv Detail & Related papers (2020-01-23T19:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.