MixMask: Revisiting Masking Strategy for Siamese ConvNets
- URL: http://arxiv.org/abs/2210.11456v3
- Date: Tue, 21 Mar 2023 16:57:57 GMT
- Title: MixMask: Revisiting Masking Strategy for Siamese ConvNets
- Authors: Kirill Vishniakov and Eric Xing and Zhiqiang Shen
- Abstract summary: We propose a filling-based masking strategy called MixMask to prevent information incompleteness caused by the randomly erased regions in an image.
Our proposed framework achieves superior accuracy on linear probing, semi-supervised, and supervised finetuning, outperforming the state-of-the-art MSCN by a significant margin.
- Score: 24.20212182301359
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in self-supervised learning have integrated Masked Image
Modeling (MIM) and Siamese Networks into a unified framework that leverages the
benefits of both techniques. However, several issues remain unaddressed when
applying conventional erase-based masking with Siamese ConvNets. These include
(I) the inability to drop uninformative masked regions in ConvNets as they
process data continuously, resulting in low training efficiency compared to ViT
models; and (II) the mismatch between erase-based masking and the
contrastive-based objective in Siamese ConvNets, which differs from the MIM
approach. In this paper, we propose a filling-based masking strategy called
MixMask to prevent information incompleteness caused by the randomly erased
regions in an image in the vanilla masking method. Furthermore, we introduce a
flexible loss function design that considers the semantic distance change
between two different mixed views to adapt the integrated architecture and
prevent mismatches between the transformed input and objective in Masked
Siamese ConvNets (MSCN). We conducted extensive experiments on various
datasets, including CIFAR-100, Tiny-ImageNet, and ImageNet-1K. The results
demonstrate that our proposed framework achieves superior accuracy on linear
probing, semi-supervised, and supervised finetuning, outperforming the
state-of-the-art MSCN by a significant margin. Additionally, we demonstrate the
superiority of our approach in object detection and segmentation tasks. Our
source code is available at https://github.com/LightnessOfBeing/MixMask.
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where [63.61248884015162]
We aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks.
We propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background.
arXiv Detail & Related papers (2023-09-22T09:58:38Z) - Toward a Deeper Understanding: RetNet Viewed through Convolution [25.8904146140577]
Vision Transformer (ViT) can learn global dependencies superior to CNN, yet CNN's inherent locality can substitute for expensive training resources.
This paper investigates the effectiveness of RetNet from a CNN perspective and presents a variant of RetNet tailored to the visual domain.
We propose a novel Gaussian mixture mask (GMM) in which one mask only has two learnable parameters and it can be conveniently used in any ViT variants whose attention mechanism allows the use of masks.
arXiv Detail & Related papers (2023-09-11T10:54:22Z) - Mask-Free Video Instance Segmentation [102.50936366583106]
Video masks are tedious and expensive to annotate, limiting the scale and diversity of existing VIS datasets.
We propose MaskFreeVIS, achieving highly competitive VIS performance, while only using bounding box annotations for the object state.
Our TK-Loss finds one-to-many matches across frames, through an efficient patch-matching step followed by a K-nearest neighbor selection.
arXiv Detail & Related papers (2023-03-28T11:48:07Z) - Efficient Masked Autoencoders with Self-Consistency [34.7076436760695]
Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision.
We propose efficient masked autoencoders with self-consistency (EMAE) to improve the pre-training efficiency.
EMAE consistently obtains state-of-the-art transfer ability on a variety of downstream tasks, such as image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2023-02-28T09:21:12Z) - Masked Siamese ConvNets [17.337143119620755]
Self-supervised learning has shown superior performances over supervised methods on various vision benchmarks.
Masked siamese networks require particular inductive bias and practically only work well with Vision Transformers.
This work empirically studies the problems behind masked siamese networks with ConvNets.
arXiv Detail & Related papers (2022-06-15T17:52:23Z) - Adversarial Masking for Self-Supervised Learning [81.25999058340997]
Masked image model (MIM) framework for self-supervised learning, ADIOS, is proposed.
It simultaneously learns a masking function and an image encoder using an adversarial objective.
It consistently improves on state-of-the-art self-supervised learning (SSL) methods on a variety of tasks and datasets.
arXiv Detail & Related papers (2022-01-31T10:23:23Z) - Self-Supervised Visual Representations Learning by Contrastive Mask
Prediction [129.25459808288025]
We propose a novel contrastive mask prediction (CMP) task for visual representation learning.
MaskCo contrasts region-level features instead of view-level features, which makes it possible to identify the positive sample without any assumptions.
We evaluate MaskCo on training datasets beyond ImageNet and compare its performance with MoCo V2.
arXiv Detail & Related papers (2021-08-18T02:50:33Z) - Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness [66.55719330810547]
Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial.
We propose a novel mask-aware inpainting solution that learns multi-scale features for missing regions in the encoding phase.
Our framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets.
arXiv Detail & Related papers (2021-04-28T13:17:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.