Uniform Masking Prevails in Vision-Language Pretraining
- URL: http://arxiv.org/abs/2212.05195v1
- Date: Sat, 10 Dec 2022 04:02:19 GMT
- Title: Uniform Masking Prevails in Vision-Language Pretraining
- Authors: Siddharth Verma, Yuchen Lu, Rui Hou, Hanchao Yu, Nicolas Ballas,
Madian Khabsa, Amjad Almahairi
- Abstract summary: Masked Language Modeling (MLM) has proven to be an essential component of Vision-Language (VL) pretraining.
This paper shows that increasing the masking rate leads to gains in Image-Text Matching (ITM) tasks.
- Score: 26.513450527203453
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Masked Language Modeling (MLM) has proven to be an essential component of
Vision-Language (VL) pretraining. To implement MLM, the researcher must make
two design choices: the masking strategy, which determines which tokens to
mask, and the masking rate, which determines how many tokens to mask. Previous
work has focused primarily on the masking strategy while setting the masking
rate at a default of 15\%. In this paper, we show that increasing this masking
rate improves downstream performance while simultaneously reducing performance
gap among different masking strategies, rendering the uniform masking strategy
competitive to other more complex ones. Surprisingly, we also discover that
increasing the masking rate leads to gains in Image-Text Matching (ITM) tasks,
suggesting that the role of MLM goes beyond language modeling in VL
pretraining.
Related papers
- Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation [42.020470627552136]
Open-vocabulary panoptic segmentation is an emerging task aiming to accurately segment the image into semantically meaningful masks.
mask classification is the main performance bottleneck for open-vocab panoptic segmentation.
We propose Semantic Refocused Tuning, a novel framework that greatly enhances open-vocab panoptic segmentation.
arXiv Detail & Related papers (2024-09-24T17:50:28Z) - ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Difference-Masking: Choosing What to Mask in Continued Pretraining [56.76782116221438]
We introduce Difference-Masking, a masking strategy that automatically chooses what to mask during continued pretraining.
We find that Difference-Masking outperforms baselines on continued pretraining settings across four diverse language-only and multimodal video tasks.
arXiv Detail & Related papers (2023-05-23T23:31:02Z) - Learning Better Masking for Better Language Model Pre-training [80.31112722910787]
Masked Language Modeling has been widely used as denoising objective in pre-training language models (PrLMs)
PrLMs commonly adopt a Random-Token Masking strategy where a fixed masking ratio is applied and different contents are masked by an equal probability throughout the entire training.
We propose two scheduled masking approaches that adaptively tune the masking ratio and masked content in different training stages.
arXiv Detail & Related papers (2022-08-23T08:27:52Z) - Should You Mask 15% in Masked Language Modeling? [86.91486000124156]
Masked language models conventionally use a masking rate of 15%.
We find that masking up to 40% of input tokens can outperform the 15% baseline.
arXiv Detail & Related papers (2022-02-16T11:42:34Z) - Data Efficient Masked Language Modeling for Vision and Language [16.95631509102115]
Masked language modeling (MLM) is one of the key sub-tasks in vision-language training.
In the cross-modal setting, tokens in the sentence are masked at random, and the model predicts the masked tokens given the image and the text.
We investigate a range of alternative masking strategies specific to the cross-modal setting that address these shortcomings.
arXiv Detail & Related papers (2021-09-05T11:27:53Z) - Improving Self-supervised Pre-training via a Fully-Explored Masked
Language Model [57.77981008219654]
Masked Language Model (MLM) framework has been widely adopted for self-supervised language pre-training.
We propose a fully-explored masking strategy, where a text sequence is divided into a certain number of non-overlapping segments.
arXiv Detail & Related papers (2020-10-12T21:28:14Z) - PMI-Masking: Principled masking of correlated spans [46.36098771676867]
Masking tokens uniformly at random constitutes a common flaw in the pretraining of Masked Language Models (MLMs)
We propose PMI-Masking, a principled masking strategy based on the concept of Pointwise Mutual Information (PMI)
We show experimentally that PMI-Masking reaches the performance of prior masking approaches in half the training time, and consistently improves performance at the end of training.
arXiv Detail & Related papers (2020-10-05T07:19:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.