Incremental Transformer Structure Enhanced Image Inpainting with Masking
Positional Encoding
- URL: http://arxiv.org/abs/2203.00867v1
- Date: Wed, 2 Mar 2022 04:27:27 GMT
- Title: Incremental Transformer Structure Enhanced Image Inpainting with Masking
Positional Encoding
- Authors: Qiaole Dong, Chenjie Cao, Yanwei Fu
- Abstract summary: The proposed model restores holistic image structures with a powerful attention-based transformer model in a fixed low-resolution sketch space.
Our model can be integrated with other pretrained inpainting models efficiently with the zero-d residual addition.
- Score: 38.014569953980754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image inpainting has made significant advances in recent years. However, it
is still challenging to recover corrupted images with both vivid textures and
reasonable structures. Some specific methods only tackle regular textures while
losing holistic structures due to the limited receptive fields of convolutional
neural networks (CNNs). On the other hand, attention-based models can learn
better long-range dependency for the structure recovery, but they are limited
by the heavy computation for inference with large image sizes. To address these
issues, we propose to leverage an additional structure restorer to facilitate
the image inpainting incrementally. The proposed model restores holistic image
structures with a powerful attention-based transformer model in a fixed
low-resolution sketch space. Such a grayscale space is easy to be upsampled to
larger scales to convey correct structural information. Our structure restorer
can be integrated with other pretrained inpainting models efficiently with the
zero-initialized residual addition. Furthermore, a masking positional encoding
strategy is utilized to improve the performance with large irregular masks.
Extensive experiments on various datasets validate the efficacy of our model
compared with other competitors. Our codes are released in
https://github.com/DQiaole/ZITS_inpainting.
Related papers
- Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting [2.3014300466616078]
This paper diverges from vision transformers by using a computationally-efficient WaveMix-based fully convolutional architecture -- WavePaint.
It uses a 2D-discrete wavelet transform (DWT) for spatial and multi-resolution token-mixing along with convolutional layers.
Our model even outperforms current GAN-based architectures in CelebA-HQ dataset without using an adversarially trainable discriminator.
arXiv Detail & Related papers (2023-07-01T18:41:34Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - ZITS++: Image Inpainting by Improving the Incremental Transformer on
Structural Priors [38.014569953980754]
We study learning a Zero-d residual addition based Incremental Transformer on Structural priors (ZITS++)
Specifically, given one corrupt image, we present the Transformer Structure Restorer (TSR) module to restore holistic structural priors at low image resolution.
We also explore the effects of various image priors for inpainting and investigate how to utilize them to address high-resolution image inpainting.
arXiv Detail & Related papers (2022-10-12T06:33:47Z) - MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions.
Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z) - Pyramid Attention Networks for Image Restoration [124.34970277136061]
Self-similarity refers to the image prior widely used in image restoration algorithms.
Recent advanced deep convolutional neural network based methods for image restoration do not take full advantage of self-similarities.
We present a novel Pyramid Attention module for image restoration, which captures long-range feature correspondences from a multi-scale feature pyramid.
arXiv Detail & Related papers (2020-04-28T21:12:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.