Hard Patches Mining for Masked Image Modeling
- URL: http://arxiv.org/abs/2304.05919v1
- Date: Wed, 12 Apr 2023 15:38:23 GMT
- Title: Hard Patches Mining for Masked Image Modeling
- Authors: Haochen Wang, Kaiyou Song, Junsong Fan, Yuxi Wang, Jin Xie, Zhaoxiang
Zhang
- Abstract summary: Masked image modeling (MIM) has attracted much research attention due to its promising potential for learning scalable visual representations.
We propose Hard Patches Mining (HPM), a brand-new framework for MIM pre-training.
- Score: 52.46714618641274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked image modeling (MIM) has attracted much research attention due to its
promising potential for learning scalable visual representations. In typical
approaches, models usually focus on predicting specific contents of masked
patches, and their performances are highly related to pre-defined mask
strategies. Intuitively, this procedure can be considered as training a student
(the model) on solving given problems (predict masked patches). However, we
argue that the model should not only focus on solving given problems, but also
stand in the shoes of a teacher to produce a more challenging problem by
itself. To this end, we propose Hard Patches Mining (HPM), a brand-new
framework for MIM pre-training. We observe that the reconstruction loss can
naturally be the metric of the difficulty of the pre-training task. Therefore,
we introduce an auxiliary loss predictor, predicting patch-wise losses first
and deciding where to mask next. It adopts a relative relationship learning
strategy to prevent overfitting to exact reconstruction loss values.
Experiments under various settings demonstrate the effectiveness of HPM in
constructing masked images. Furthermore, we empirically find that solely
introducing the loss prediction objective leads to powerful representations,
verifying the efficacy of the ability to be aware of where is hard to
reconstruct.
Related papers
- AEMIM: Adversarial Examples Meet Masked Image Modeling [12.072673694665934]
We propose to incorporate adversarial examples into masked image modeling, as the new reconstruction targets.
In particular, we introduce a novel auxiliary pretext task that reconstructs the adversarial examples corresponding to the original images.
We also devise an innovative adversarial attack to craft more suitable adversarial examples for MIM pre-training.
arXiv Detail & Related papers (2024-07-16T09:39:13Z) - Bootstrap Masked Visual Modeling via Hard Patches Mining [68.74750345823674]
Masked visual modeling has attracted much attention due to its promising potential in learning generalizable representations.
We argue that it is equally important for the model to stand in the shoes of a teacher to produce challenging problems by itself.
To empower the model as a teacher, we propose Hard Patches Mining (HPM), predicting patch-wise losses and subsequently determining where to mask.
arXiv Detail & Related papers (2023-12-21T10:27:52Z) - Meta-Prior: Meta learning for Adaptive Inverse Problem Solvers [9.364509804053275]
Real-world imaging challenges often lack ground truth data, rendering traditional supervised approaches ineffective.
Our method trains a meta-model on a diverse set of imaging tasks that allows the model to be efficiently fine-tuned for specific tasks.
In simple settings, this approach recovers the Bayes optimal estimator, illustrating the soundness of our approach.
arXiv Detail & Related papers (2023-11-30T17:02:27Z) - SMOOT: Saliency Guided Mask Optimized Online Training [3.024318849346373]
Saliency-Guided Training (SGT) methods try to highlight the prominent features in the model's training based on the output.
SGT makes the model's final result more interpretable by masking input partially.
We propose a novel method to determine the optimal number of masked images based on input, accuracy, and model loss during the training.
arXiv Detail & Related papers (2023-10-01T19:41:49Z) - AMLP:Adaptive Masking Lesion Patches for Self-supervised Medical Image
Segmentation [67.97926983664676]
Self-supervised masked image modeling has shown promising results on natural images.
However, directly applying such methods to medical images remains challenging.
We propose a novel self-supervised medical image segmentation framework, Adaptive Masking Lesion Patches (AMLP)
arXiv Detail & Related papers (2023-09-08T13:18:10Z) - Understanding Masked Autoencoders via Hierarchical Latent Variable
Models [109.35382136147349]
Masked autoencoder (MAE) has recently achieved prominent success in a variety of vision tasks.
Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking.
arXiv Detail & Related papers (2023-06-08T03:00:10Z) - DPPMask: Masked Image Modeling with Determinantal Point Processes [49.65141962357528]
Masked Image Modeling (MIM) has achieved impressive representative performance with the aim of reconstructing randomly masked images.
We show that uniformly random masking widely used in previous works unavoidably loses some key objects and changes original semantic information.
To address this issue, we augment MIM with a new masking strategy namely the DPPMask.
Our method is simple yet effective and requires no extra learnable parameters when implemented within various frameworks.
arXiv Detail & Related papers (2023-03-13T13:40:39Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.