Related papers: ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders

ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders

URL: http://arxiv.org/abs/2407.13036v1
Date: Wed, 17 Jul 2024 22:04:00 GMT
Title: ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders
Authors: Carlos Hinojosa, Shuming Liu, Bernard Ghanem,
Abstract summary: Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework. We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. We demonstrate our strategy's superiority in downstream tasks compared to random masking.
Score: 53.3185750528969
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework, offering remarkable performance across a wide range of downstream tasks. To increase the difficulty of the pretext task and learn richer visual representations, existing works have focused on replacing standard random masking with more sophisticated strategies, such as adversarial-guided and teacher-guided masking. However, these strategies depend on the input data thus commonly increasing the model complexity and requiring additional calculations to generate the mask patterns. This raises the question: Can we enhance MAE performance beyond random masking without relying on input data or incurring additional computational costs? In this work, we introduce a simple yet effective data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. Drawing inspiration from color noise in image processing, we explore four types of filters to yield mask patterns with different spatial and semantic priors. ColorMAE requires no additional learnable parameters or computational overhead in the network, yet it significantly enhances the learned representations. We provide a comprehensive empirical evaluation, demonstrating our strategy's superiority in downstream tasks compared to random masking. Notably, we report an improvement of 2.72 in mIoU in semantic segmentation tasks relative to baseline MAE implementations.

Related papers

Structured-Noise Masked Modeling for Video, Audio and Beyond [74.6545360752605]
We introduce structured noise-based masking, a simple yet effective approach that naturally aligns with the spatial, temporal, and spectral characteristics of video and audio data. Our approach improves the performance of masked video and audio modeling frameworks without any computational overhead.
arXiv Detail & Related papers (2025-03-20T16:34:14Z)
Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization [42.82742477950748]
Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. We introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that learns an optimal masking strategy during pretraining. Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning.
arXiv Detail & Related papers (2024-02-28T07:37:26Z)
DynaMask: Dynamic Mask Selection for Instance Segmentation [21.50329070835023]
We develop a Mask Switch Module (MSM) with negligible computational cost to select the most suitable mask resolution for each instance. The proposed method, namely DynaMask, brings consistent and noticeable performance improvements over other state-of-the-arts at a moderate computation overhead.
arXiv Detail & Related papers (2023-03-14T13:01:25Z)
DPPMask: Masked Image Modeling with Determinantal Point Processes [49.65141962357528]
Masked Image Modeling (MIM) has achieved impressive representative performance with the aim of reconstructing randomly masked images. We show that uniformly random masking widely used in previous works unavoidably loses some key objects and changes original semantic information. To address this issue, we augment MIM with a new masking strategy namely the DPPMask. Our method is simple yet effective and requires no extra learnable parameters when implemented within various frameworks.
arXiv Detail & Related papers (2023-03-13T13:40:39Z)
Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data. We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process. In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z)
Efficient Masked Autoencoders with Self-Consistency [34.7076436760695]
Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision. We propose efficient masked autoencoders with self-consistency (EMAE) to improve the pre-training efficiency. EMAE consistently obtains state-of-the-art transfer ability on a variety of downstream tasks, such as image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2023-02-28T09:21:12Z)
MixMask: Revisiting Masking Strategy for Siamese ConvNets [23.946791390657875]
This work introduces a novel filling-based masking approach, termed textbfMixMask. The proposed method replaces erased areas with content from a different image, effectively countering the information depletion seen in traditional masking methods. We empirically validate our framework's enhanced performance in areas such as linear probing, semi-supervised and supervised finetuning, object detection and segmentation.
arXiv Detail & Related papers (2022-10-20T17:54:03Z)
What You See is What You Classify: Black Box Attributions [61.998683569022006]
We train a deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum. Unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks. We show that our attributions are superior to established methods both visually and quantitatively.
arXiv Detail & Related papers (2022-05-23T12:30:04Z)
SODAR: Segmenting Objects by DynamicallyAggregating Neighboring Mask Representations [90.8752454643737]
Recent state-of-the-art one-stage instance segmentation model SOLO divides the input image into a grid and directly predicts per grid cell object masks with fully-convolutional networks. We observe SOLO generates similar masks for an object at nearby grid cells, and these neighboring predictions can complement each other as some may better segment certain object part. Motivated by the observed gap, we develop a novel learning-based aggregation method that improves upon SOLO by leveraging the rich neighboring information.
arXiv Detail & Related papers (2022-02-15T13:53:03Z)
Self-Supervised Visual Representations Learning by Contrastive Mask Prediction [129.25459808288025]
We propose a novel contrastive mask prediction (CMP) task for visual representation learning. MaskCo contrasts region-level features instead of view-level features, which makes it possible to identify the positive sample without any assumptions. We evaluate MaskCo on training datasets beyond ImageNet and compare its performance with MoCo V2.
arXiv Detail & Related papers (2021-08-18T02:50:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.