Related papers: Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training

Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training

URL: http://arxiv.org/abs/2404.08327v1
Date: Fri, 12 Apr 2024 08:38:51 GMT
Title: Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training
Authors: Hyesong Choi, Hyejin Park, Kwang Moo Yi, Sungmin Cha, Dongbo Min,
Abstract summary: Saliency-Based Adaptive Masking improves pre-training performance of MIM approaches by prioritizing token salience. We show that our method significantly improves over the state-of-the-art in mask-based pre-training on the ImageNet-1K dataset.
Score: 33.39585710223628
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce Saliency-Based Adaptive Masking (SBAM), a novel and cost-effective approach that significantly enhances the pre-training performance of Masked Image Modeling (MIM) approaches by prioritizing token salience. Our method provides robustness against variations in masking ratios, effectively mitigating the performance instability issues common in existing methods. This relaxes the sensitivity of MIM-based pre-training to masking ratios, which in turn allows us to propose an adaptive strategy for `tailored' masking ratios for each data sample, which no existing method can provide. Toward this goal, we propose an Adaptive Masking Ratio (AMR) strategy that dynamically adjusts the proportion of masking for the unique content of each image based on token salience. We show that our method significantly improves over the state-of-the-art in mask-based pre-training on the ImageNet-1K dataset.

Related papers

Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection [54.21851618853518]
We present a concise yet effective approach called Patch Generation-to-Selection to enhance CLIP's training efficiency. Our approach, CLIP-PGS, sets new state-of-the-art results in zero-shot classification and retrieval tasks.
arXiv Detail & Related papers (2025-03-21T12:10:38Z)
Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning [63.55145330447408]
We propose a novel textbfSelf-textbfPerceptinon textbfTuning (textbfSPT) method for anomaly segmentation. The SPT method incorporates a self-drafting tuning strategy, which generates an initial coarse draft of the anomaly mask, followed by a refinement process.
arXiv Detail & Related papers (2024-11-26T08:33:25Z)
Symmetric masking strategy enhances the performance of Masked Image Modeling [0.0]
Masked Image Modeling (MIM) is a technique in self-supervised learning that focuses on acquiring detailed visual representations from unlabeled images. We propose a new masking strategy that effectively helps the model capture global and local features. Based on this masking strategy, SymMIM, our proposed training pipeline for MIM is introduced.
arXiv Detail & Related papers (2024-08-23T00:15:43Z)
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework. We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z)
Emerging Property of Masked Token for Effective Pre-training [15.846621577804791]
Masked Image Modeling (MIM) has been instrumental in driving recent breakthroughs in computer vision. MIM's overall efficiency is occasionally hampered by the lengthy duration of the pre-training phase. We propose a novel approach termed masked token optimization (MTO), specifically designed to improve model efficiency through weight recalibration and the enhancement of the key property of masked tokens.
arXiv Detail & Related papers (2024-04-12T08:46:53Z)
Variance-insensitive and Target-preserving Mask Refinement for Interactive Image Segmentation [68.16510297109872]
Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing. We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs. Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.
arXiv Detail & Related papers (2023-12-22T02:31:31Z)
Boosting Adversarial Transferability with Learnable Patch-wise Masks [16.46210182214551]
Adversarial examples have attracted widespread attention in security-critical applications because of their transferability across different models. In this paper, we argue that the model-specific discriminative regions are a key factor causing overfitting to the source model, and thus reducing the transferability to the target model. To accurately localize these regions, we present a learnable approach to automatically optimize the mask.
arXiv Detail & Related papers (2023-06-28T05:32:22Z)
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training [59.923672191632065]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT) MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies. Our results demonstrate that MaPeT achieves competitive performance on ImageNet.
arXiv Detail & Related papers (2023-06-12T18:12:19Z)
Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data. We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process. In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z)
Learning Better Masking for Better Language Model Pre-training [80.31112722910787]
Masked Language Modeling has been widely used as denoising objective in pre-training language models (PrLMs) PrLMs commonly adopt a Random-Token Masking strategy where a fixed masking ratio is applied and different contents are masked by an equal probability throughout the entire training. We propose two scheduled masking approaches that adaptively tune the masking ratio and masked content in different training stages.
arXiv Detail & Related papers (2022-08-23T08:27:52Z)
PMI-Masking: Principled masking of correlated spans [46.36098771676867]
Masking tokens uniformly at random constitutes a common flaw in the pretraining of Masked Language Models (MLMs) We propose PMI-Masking, a principled masking strategy based on the concept of Pointwise Mutual Information (PMI) We show experimentally that PMI-Masking reaches the performance of prior masking approaches in half the training time, and consistently improves performance at the end of training.
arXiv Detail & Related papers (2020-10-05T07:19:52Z)
Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models [54.569004548170824]
We show that careful masking strategies can bridge the knowledge gap of masked language models. We propose an effective training strategy by adversarially masking out those tokens which are harder to adversarial by the underlying.
arXiv Detail & Related papers (2020-10-05T01:49:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.