DPPMask: Masked Image Modeling with Determinantal Point Processes
- URL: http://arxiv.org/abs/2303.12736v2
- Date: Sat, 25 Mar 2023 08:00:58 GMT
- Title: DPPMask: Masked Image Modeling with Determinantal Point Processes
- Authors: Junde Xu, Zikai Lin, Donghao Zhou, Yaodong Yang, Xiangyun Liao, Bian
Wu, Guangyong Chen, Pheng-Ann Heng
- Abstract summary: Masked Image Modeling (MIM) has achieved impressive representative performance with the aim of reconstructing randomly masked images.
We show that uniformly random masking widely used in previous works unavoidably loses some key objects and changes original semantic information.
To address this issue, we augment MIM with a new masking strategy namely the DPPMask.
Our method is simple yet effective and requires no extra learnable parameters when implemented within various frameworks.
- Score: 49.65141962357528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Masked Image Modeling (MIM) has achieved impressive representative
performance with the aim of reconstructing randomly masked images. Despite the
empirical success, most previous works have neglected the important fact that
it is unreasonable to force the model to reconstruct something beyond recovery,
such as those masked objects. In this work, we show that uniformly random
masking widely used in previous works unavoidably loses some key objects and
changes original semantic information, resulting in a misalignment problem and
hurting the representative learning eventually. To address this issue, we
augment MIM with a new masking strategy namely the DPPMask by substituting the
random process with Determinantal Point Process (DPPs) to reduce the semantic
change of the image after masking. Our method is simple yet effective and
requires no extra learnable parameters when implemented within various
frameworks. In particular, we evaluate our method on two representative MIM
frameworks, MAE and iBOT. We show that DPPMask surpassed random sampling under
both lower and higher masking ratios, indicating that DPPMask makes the
reconstruction task more reasonable. We further test our method on the
background challenge and multi-class classification tasks, showing that our
method is more robust at various tasks.
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Bootstrap Masked Visual Modeling via Hard Patches Mining [68.74750345823674]
Masked visual modeling has attracted much attention due to its promising potential in learning generalizable representations.
We argue that it is equally important for the model to stand in the shoes of a teacher to produce challenging problems by itself.
To empower the model as a teacher, we propose Hard Patches Mining (HPM), predicting patch-wise losses and subsequently determining where to mask.
arXiv Detail & Related papers (2023-12-21T10:27:52Z) - Hard Patches Mining for Masked Image Modeling [52.46714618641274]
Masked image modeling (MIM) has attracted much research attention due to its promising potential for learning scalable visual representations.
We propose Hard Patches Mining (HPM), a brand-new framework for MIM pre-training.
arXiv Detail & Related papers (2023-04-12T15:38:23Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling [83.67628239775878]
Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.
This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction.
We propose a remarkably simple and effective method, ourmethod, that entails two strategies.
arXiv Detail & Related papers (2023-03-04T13:38:51Z) - Efficient Masked Autoencoders with Self-Consistency [34.7076436760695]
Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision.
We propose efficient masked autoencoders with self-consistency (EMAE) to improve the pre-training efficiency.
EMAE consistently obtains state-of-the-art transfer ability on a variety of downstream tasks, such as image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2023-02-28T09:21:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.