MixMask: Revisiting Masking Strategy for Siamese ConvNets
- URL: http://arxiv.org/abs/2210.11456v3
- Date: Tue, 21 Mar 2023 16:57:57 GMT
- Title: MixMask: Revisiting Masking Strategy for Siamese ConvNets
- Authors: Kirill Vishniakov and Eric Xing and Zhiqiang Shen
- Abstract summary: We propose a filling-based masking strategy called MixMask to prevent information incompleteness caused by the randomly erased regions in an image.
Our proposed framework achieves superior accuracy on linear probing, semi-supervised, and supervised finetuning, outperforming the state-of-the-art MSCN by a significant margin.
- Score: 24.20212182301359
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in self-supervised learning have integrated Masked Image
Modeling (MIM) and Siamese Networks into a unified framework that leverages the
benefits of both techniques. However, several issues remain unaddressed when
applying conventional erase-based masking with Siamese ConvNets. These include
(I) the inability to drop uninformative masked regions in ConvNets as they
process data continuously, resulting in low training efficiency compared to ViT
models; and (II) the mismatch between erase-based masking and the
contrastive-based objective in Siamese ConvNets, which differs from the MIM
approach. In this paper, we propose a filling-based masking strategy called
MixMask to prevent information incompleteness caused by the randomly erased
regions in an image in the vanilla masking method. Furthermore, we introduce a
flexible loss function design that considers the semantic distance change
between two different mixed views to adapt the integrated architecture and
prevent mismatches between the transformed input and objective in Masked
Siamese ConvNets (MSCN). We conducted extensive experiments on various
datasets, including CIFAR-100, Tiny-ImageNet, and ImageNet-1K. The results
demonstrate that our proposed framework achieves superior accuracy on linear
probing, semi-supervised, and supervised finetuning, outperforming the
state-of-the-art MSCN by a significant margin. Additionally, we demonstrate the
superiority of our approach in object detection and segmentation tasks. Our
source code is available at https://github.com/LightnessOfBeing/MixMask.
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Variance-insensitive and Target-preserving Mask Refinement for
Interactive Image Segmentation [68.16510297109872]
Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing.
We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs.
Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.
arXiv Detail & Related papers (2023-12-22T02:31:31Z) - Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where [63.61248884015162]
We aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks.
We propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background.
arXiv Detail & Related papers (2023-09-22T09:58:38Z) - Unmasking Anomalies in Road-Scene Segmentation [18.253109627901566]
Anomaly segmentation is a critical task for driving applications.
We propose a paradigm change by shifting from a per-pixel classification to a mask classification.
Mask2Anomaly demonstrates the feasibility of integrating an anomaly detection method in a mask-classification architecture.
arXiv Detail & Related papers (2023-07-25T08:23:10Z) - Mask to reconstruct: Cooperative Semantics Completion for Video-text
Retrieval [19.61947785487129]
Mask for Semantics Completion (MASCOT) based on semantic-based masked modeling.
Our MASCOT performs state-of-the-art performance on four major text-video retrieval benchmarks.
arXiv Detail & Related papers (2023-05-13T12:31:37Z) - GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds [72.60362979456035]
Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds.
We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context.
We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
arXiv Detail & Related papers (2022-12-06T14:32:55Z) - Exploiting Shape Cues for Weakly Supervised Semantic Segmentation [15.791415215216029]
Weakly supervised semantic segmentation (WSSS) aims to produce pixel-wise class predictions with only image-level labels for training.
We propose to exploit shape information to supplement the texture-biased property of convolutional neural networks (CNNs)
We further refine the predictions in an online fashion with a novel refinement method that takes into account both the class and the color affinities.
arXiv Detail & Related papers (2022-08-08T17:25:31Z) - Self-Supervised Visual Representations Learning by Contrastive Mask
Prediction [129.25459808288025]
We propose a novel contrastive mask prediction (CMP) task for visual representation learning.
MaskCo contrasts region-level features instead of view-level features, which makes it possible to identify the positive sample without any assumptions.
We evaluate MaskCo on training datasets beyond ImageNet and compare its performance with MoCo V2.
arXiv Detail & Related papers (2021-08-18T02:50:33Z) - Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness [66.55719330810547]
Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial.
We propose a novel mask-aware inpainting solution that learns multi-scale features for missing regions in the encoding phase.
Our framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets.
arXiv Detail & Related papers (2021-04-28T13:17:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.