Learning Prior Feature and Attention Enhanced Image Inpainting
- URL: http://arxiv.org/abs/2208.01837v2
- Date: Sat, 26 Aug 2023 03:30:36 GMT
- Title: Learning Prior Feature and Attention Enhanced Image Inpainting
- Authors: Chenjie Cao, Qiaole Dong, Yanwei Fu
- Abstract summary: This paper incorporates the pre-training based Masked AutoEncoder (MAE) into the inpainting model.
We propose to use attention priors from MAE to make the inpainting model learn more long-distance dependencies between masked and unmasked regions.
- Score: 63.21231753407192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many recent inpainting works have achieved impressive results by leveraging
Deep Neural Networks (DNNs) to model various prior information for image
restoration. Unfortunately, the performance of these methods is largely limited
by the representation ability of vanilla Convolutional Neural Networks (CNNs)
backbones.On the other hand, Vision Transformers (ViT) with self-supervised
pre-training have shown great potential for many visual recognition and object
detection tasks. A natural question is whether the inpainting task can be
greatly benefited from the ViT backbone? However, it is nontrivial to directly
replace the new backbones in inpainting networks, as the inpainting is an
inverse problem fundamentally different from the recognition tasks. To this
end, this paper incorporates the pre-training based Masked AutoEncoder (MAE)
into the inpainting model, which enjoys richer informative priors to enhance
the inpainting process. Moreover, we propose to use attention priors from MAE
to make the inpainting model learn more long-distance dependencies between
masked and unmasked regions. Sufficient ablations have been discussed about the
inpainting and the self-supervised pre-training models in this paper. Besides,
experiments on both Places2 and FFHQ demonstrate the effectiveness of our
proposed model. Codes and pre-trained models are released in
https://github.com/ewrfcas/MAE-FAR.
Related papers
- BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - Stare at What You See: Masked Image Modeling without Reconstruction [154.74533119863864]
Masked Autoencoders (MAE) have been prevailing paradigms for large-scale vision representation pre-training.
Recent approaches apply semantic-rich teacher models to extract image features as the reconstruction target, leading to better performance.
We argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image.
arXiv Detail & Related papers (2022-11-16T12:48:52Z) - Wavelet Prior Attention Learning in Axial Inpainting Network [35.06912946192495]
We propose a novel model -- Wavelet prior attention learning in Axial Inpainting Network (WAIN)
The WPA guides the high-level feature aggregation in the multi-scale frequency domain, alleviating the textual artifacts.
Stacked ATs employ unmasked clues to help model reasonable features along with low-level features of horizontal and vertical axes.
arXiv Detail & Related papers (2022-06-07T08:45:27Z) - The Devil is in the Frequency: Geminated Gestalt Autoencoder for
Self-Supervised Visual Pre-Training [13.087987450384036]
We present a new Masked Image Modeling (MIM), termed Geminated Autoencoder (Ge$2$-AE) for visual pre-training.
Specifically, we equip our model with geminated decoders in charge of reconstructing image contents from both pixel and frequency space.
arXiv Detail & Related papers (2022-04-18T09:22:55Z) - MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions.
Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z) - Restore from Restored: Single-image Inpainting [9.699531255678856]
We present a novel and efficient self-supervised fine-tuning algorithm for inpainting networks.
We update the parameters of the pre-trained inpainting networks by utilizing existing self-similar patches.
We achieve state-of-the-art inpainting results on publicly available benchmark datasets.
arXiv Detail & Related papers (2021-10-25T11:38:51Z) - Restore from Restored: Single-image Inpainting [9.699531255678856]
We present a novel and efficient self-supervised fine-tuning algorithm for inpainting networks.
We upgrade the parameters of the pretrained networks by utilizing existing self-similar patches within the given input image.
We achieve state-of-the-art inpainting results on publicly available benchmark datasets.
arXiv Detail & Related papers (2021-02-16T10:59:28Z) - Deep Generative Model for Image Inpainting with Local Binary Pattern
Learning and Spatial Attention [28.807711307545112]
We propose a new end-to-end, two-stage (coarse-to-fine) generative model through combining a local binary pattern (LBP) learning network with an actual inpainting network.
Experiments on public datasets including CelebA-HQ, Places and Paris StreetView demonstrate that our model generates better inpainting results than the state-of-the-art competing algorithms.
arXiv Detail & Related papers (2020-09-02T12:59:28Z) - Adversarially-Trained Deep Nets Transfer Better: Illustration on Image
Classification [53.735029033681435]
Transfer learning is a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains.
In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models.
arXiv Detail & Related papers (2020-07-11T22:48:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.