Mask Hierarchical Features For Self-Supervised Learning
- URL: http://arxiv.org/abs/2304.00218v1
- Date: Sat, 1 Apr 2023 04:14:57 GMT
- Title: Mask Hierarchical Features For Self-Supervised Learning
- Authors: Fenggang Liu, Yangguang Li, Feng Liang, Jilan Xu, Bin Huang, Jing Shao
- Abstract summary: This paper shows that Masking the Deep hierarchical features is an efficient self-supervised method, denoted as MaskDeep.
We mask part of patches in the representation space and then utilize sparse visible patches to reconstruct high semantic image representation.
Trained on ResNet50 with 200 epochs, MaskDeep achieves state-of-the-art results of 71.2% Top1 accuracy linear classification on ImageNet.
- Score: 23.140060988999352
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper shows that Masking the Deep hierarchical features is an efficient
self-supervised method, denoted as MaskDeep. MaskDeep treats each patch in the
representation space as an independent instance. We mask part of patches in the
representation space and then utilize sparse visible patches to reconstruct
high semantic image representation. The intuition of MaskDeep lies in the fact
that models can reason from sparse visible patches semantic to the global
semantic of the image. We further propose three designs in our framework: 1) a
Hierarchical Deep-Masking module to concern the hierarchical property of patch
representations, 2) a multi-group strategy to improve the efficiency without
any extra computing consumption of the encoder and 3) a multi-target strategy
to provide more description of the global semantic. Our MaskDeep brings decent
improvements. Trained on ResNet50 with 200 epochs, MaskDeep achieves
state-of-the-art results of 71.2% Top1 accuracy linear classification on
ImageNet. On COCO object detection tasks, MaskDeep outperforms the
self-supervised method SoCo, which specifically designed for object detection.
When trained with 100 epochs, MaskDeep achieves 69.6% Top1 accuracy, which
surpasses current methods trained with 200 epochs, such as HCSC, by 0.4% .
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Downstream Task Guided Masking Learning in Masked Autoencoders Using
Multi-Level Optimization [42.82742477950748]
Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning.
We introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that learns an optimal masking strategy during pretraining.
Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning.
arXiv Detail & Related papers (2024-02-28T07:37:26Z) - Unmasking Anomalies in Road-Scene Segmentation [18.253109627901566]
Anomaly segmentation is a critical task for driving applications.
We propose a paradigm change by shifting from a per-pixel classification to a mask classification.
Mask2Anomaly demonstrates the feasibility of integrating an anomaly detection method in a mask-classification architecture.
arXiv Detail & Related papers (2023-07-25T08:23:10Z) - MP-Former: Mask-Piloted Transformer for Image Segmentation [16.620469868310288]
Mask2Former suffers from inconsistent mask predictions between decoder layers.
We propose a mask-piloted training approach, which feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones.
arXiv Detail & Related papers (2023-03-13T17:57:59Z) - Layered Depth Refinement with Mask Guidance [61.10654666344419]
We formulate a novel problem of mask-guided depth refinement that utilizes a generic mask to refine the depth prediction of SIDE models.
Our framework performs layered refinement and inpainting/outpainting, decomposing the depth map into two separate layers signified by the mask and the inverse mask.
We empirically show that our method is robust to different types of masks and initial depth predictions, accurately refining depth values in inner and outer mask boundary regions.
arXiv Detail & Related papers (2022-06-07T06:42:44Z) - What You See is What You Classify: Black Box Attributions [61.998683569022006]
We train a deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum.
Unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks.
We show that our attributions are superior to established methods both visually and quantitatively.
arXiv Detail & Related papers (2022-05-23T12:30:04Z) - Per-Pixel Classification is Not All You Need for Semantic Segmentation [184.2905747595058]
Mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks.
We propose MaskFormer, a simple mask classification model which predicts a set of binary masks.
Our method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
arXiv Detail & Related papers (2021-07-13T17:59:50Z) - MST: Masked Self-Supervised Transformer for Visual Representation [52.099722121603506]
Transformer has been widely used for self-supervised pre-training in Natural Language Processing (NLP)
We present a novel Masked Self-supervised Transformer approach named MST, which can explicitly capture the local context of an image.
MST achieves Top-1 accuracy of 76.9% with DeiT-S only using 300-epoch pre-training by linear evaluation.
arXiv Detail & Related papers (2021-06-10T11:05:18Z) - Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness [66.55719330810547]
Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial.
We propose a novel mask-aware inpainting solution that learns multi-scale features for missing regions in the encoding phase.
Our framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets.
arXiv Detail & Related papers (2021-04-28T13:17:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.