SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations
- URL: http://arxiv.org/abs/2308.08884v1
- Date: Thu, 17 Aug 2023 09:43:14 GMT
- Title: SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations
- Authors: Zhiming Wang, Lin Gu, Feng Lu
- Abstract summary: We propose to use image scale as a self-supervised signal for Masked Image Modeling (MIM)
Our framework utilizes the latest advances in super-resolution (SR) to design the prediction head.
Our method also achieves an accuracy of 74.84% on the task of recognizing low-resolution facial expressions, surpassing the current state-of-the-art FMD by 9.48%.
- Score: 17.902523856490227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the prevalence of scale variance in nature images, we propose to use
image scale as a self-supervised signal for Masked Image Modeling (MIM). Our
method involves selecting random patches from the input image and downsampling
them to a low-resolution format. Our framework utilizes the latest advances in
super-resolution (SR) to design the prediction head, which reconstructs the
input from low-resolution clues and other patches. After 400 epochs of
pre-training, our Super Resolution Masked Autoencoders (SRMAE) get an accuracy
of 82.1% on the ImageNet-1K task. Image scale signal also allows our SRMAE to
capture scale invariance representation. For the very low resolution (VLR)
recognition task, our model achieves the best performance, surpassing DeriveNet
by 1.3%. Our method also achieves an accuracy of 74.84% on the task of
recognizing low-resolution facial expressions, surpassing the current
state-of-the-art FMD by 9.48%.
Related papers
- Deep Optimal Transport: A Practical Algorithm for Photo-realistic Image Restoration [31.58365182858562]
We propose an image restoration algorithm that can control the perceptual quality and/or the mean square error (MSE) of any pre-trained model.
Given about a dozen images restored by the model, it can significantly improve the perceptual quality and/or the MSE of the model for newly restored images without further training.
arXiv Detail & Related papers (2023-06-04T12:21:53Z) - Patch-wise Features for Blur Image Classification [3.762360672951513]
Using our method we can discriminate between blur vs sharp image degradation.
Experiments conducted on an open dataset show that the proposed low compute method results in 90.1% mean accuracy on the validation set.
The proposed method is 10x faster than the VGG16 based model on CPU and scales linearly to the input image size making it suitable to be implemented on low compute edge devices.
arXiv Detail & Related papers (2023-04-06T15:39:11Z) - DPPMask: Masked Image Modeling with Determinantal Point Processes [49.65141962357528]
Masked Image Modeling (MIM) has achieved impressive representative performance with the aim of reconstructing randomly masked images.
We show that uniformly random masking widely used in previous works unavoidably loses some key objects and changes original semantic information.
To address this issue, we augment MIM with a new masking strategy namely the DPPMask.
Our method is simple yet effective and requires no extra learnable parameters when implemented within various frameworks.
arXiv Detail & Related papers (2023-03-13T13:40:39Z) - CDPMSR: Conditional Diffusion Probabilistic Models for Single Image
Super-Resolution [91.56337748920662]
Diffusion probabilistic models (DPM) have been widely adopted in image-to-image translation.
We propose a simple but non-trivial DPM-based super-resolution post-process framework,i.e., cDPMSR.
Our method surpasses prior attempts on both qualitative and quantitative results.
arXiv Detail & Related papers (2023-02-14T15:13:33Z) - BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers [117.79456335844439]
We propose to use a semantic-rich visual tokenizer as the reconstruction target for masked prediction.
We then pretrain vision Transformers by predicting the original visual tokens for the masked image patches.
Experiments on image classification and semantic segmentation show that our approach outperforms all compared MIM methods.
arXiv Detail & Related papers (2022-08-12T16:48:10Z) - Perception-Distortion Balanced ADMM Optimization for Single-Image
Super-Resolution [29.19388490351459]
We propose a novel super-resolution model with a low-frequency constraint (LFc-SR)
We introduce an ADMM-based alternating optimization method for the non-trivial learning of the constrained model.
Experiments showed that our method, without cumbersome post-processing procedures, achieved the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-05T05:37:55Z) - Patch-based image Super Resolution using generalized Gaussian mixture
model [0.0]
Single Image Super Resolution (SISR) methods aim to recover the clean images in high resolution from low resolution observations.
A family of patch-based approaches have received considerable attention and development.
This paper proposes an algorithm to learn a jointgeneralized Gaussian mixture model (GGMM) from a pair of the low resolution patches and the corresponding high resolution patches fromthe reference data.
arXiv Detail & Related papers (2022-06-07T07:40:05Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z) - Invertible Image Rescaling [118.2653765756915]
We develop an Invertible Rescaling Net (IRN) to produce visually-pleasing low-resolution images.
We capture the distribution of the lost information using a latent variable following a specified distribution in the downscaling process.
arXiv Detail & Related papers (2020-05-12T09:55:53Z) - RAIN: A Simple Approach for Robust and Accurate Image Classification
Networks [156.09526491791772]
It has been shown that the majority of existing adversarial defense methods achieve robustness at the cost of sacrificing prediction accuracy.
This paper proposes a novel preprocessing framework, which we term Robust and Accurate Image classificatioN(RAIN)
RAIN applies randomization over inputs to break the ties between the model forward prediction path and the backward gradient path, thus improving the model robustness.
We conduct extensive experiments on the STL10 and ImageNet datasets to verify the effectiveness of RAIN against various types of adversarial attacks.
arXiv Detail & Related papers (2020-04-24T02:03:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.