ZITS++: Image Inpainting by Improving the Incremental Transformer on
Structural Priors
- URL: http://arxiv.org/abs/2210.05950v3
- Date: Wed, 24 May 2023 16:19:01 GMT
- Title: ZITS++: Image Inpainting by Improving the Incremental Transformer on
Structural Priors
- Authors: Chenjie Cao, Qiaole Dong, Yanwei Fu
- Abstract summary: We study learning a Zero-d residual addition based Incremental Transformer on Structural priors (ZITS++)
Specifically, given one corrupt image, we present the Transformer Structure Restorer (TSR) module to restore holistic structural priors at low image resolution.
We also explore the effects of various image priors for inpainting and investigate how to utilize them to address high-resolution image inpainting.
- Score: 38.014569953980754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image inpainting involves filling missing areas of a corrupted image. Despite
impressive results have been achieved recently, restoring images with both
vivid textures and reasonable structures remains a significant challenge.
Previous methods have primarily addressed regular textures while disregarding
holistic structures due to the limited receptive fields of Convolutional Neural
Networks (CNNs). To this end, we study learning a Zero-initialized residual
addition based Incremental Transformer on Structural priors (ZITS++), an
improved model upon our conference work, ZITS. Specifically, given one corrupt
image, we present the Transformer Structure Restorer (TSR) module to restore
holistic structural priors at low image resolution, which are further upsampled
by Simple Structure Upsampler (SSU) module to higher image resolution. To
recover image texture details, we use the Fourier CNN Texture Restoration (FTR)
module, which is strengthened by Fourier and large-kernel attention
convolutions. Furthermore, to enhance the FTR, the upsampled structural priors
from TSR are further processed by Structure Feature Encoder (SFE) and optimized
with the Zero-initialized Residual Addition (ZeroRA) incrementally. Besides, a
new masking positional encoding is proposed to encode the large irregular
masks. Compared with ZITS, ZITS++ improves the FTR's stability and inpainting
ability with several techniques. More importantly, we comprehensively explore
the effects of various image priors for inpainting and investigate how to
utilize them to address high-resolution image inpainting with extensive
experiments. This investigation is orthogonal to most inpainting approaches and
can thus significantly benefit the community. Codes and models will be released
in https://github.com/ewrfcas/ZITS-PlusPlus.
Related papers
- How Powerful Potential of Attention on Image Restoration? [97.9777639562205]
We conduct an empirical study to explore the potential of attention mechanisms without using FFN.
We propose Continuous Scaling Attention (textbfCSAttn), a method that computes attention continuously in three stages without using FFN.
Our designs provide a closer look at the attention mechanism and reveal that some simple operations can significantly affect the model performance.
arXiv Detail & Related papers (2024-03-15T14:23:12Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Image Reconstruction using Enhanced Vision Transformer [0.08594140167290097]
We propose a novel image reconstruction framework which can be used for tasks such as image denoising, deblurring or inpainting.
The model proposed in this project is based on Vision Transformer (ViT) that takes 2D images as input and outputs embeddings.
We incorporate four additional optimization techniques in the framework to improve the model reconstruction capability.
arXiv Detail & Related papers (2023-07-11T02:14:18Z) - High-Fidelity Image Inpainting with GAN Inversion [23.49170140410603]
In this paper, we propose a novel GAN inversion model for image inpainting, dubbed InvertFill.
Within the encoder, the pre-modulation network leverages multi-scale structures to encode more discriminative semantics into style vectors.
To reconstruct faithful and photorealistic images, a simple yet effective Soft-update Mean Latent module is designed to capture more diverse in-domain patterns that synthesize high-fidelity textures for large corruptions.
arXiv Detail & Related papers (2022-08-25T03:39:24Z) - HIPA: Hierarchical Patch Transformer for Single Image Super Resolution [62.7081074931892]
This paper presents HIPA, a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition.
We build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge to the full resolution.
Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions.
arXiv Detail & Related papers (2022-03-19T05:09:34Z) - Incremental Transformer Structure Enhanced Image Inpainting with Masking
Positional Encoding [38.014569953980754]
The proposed model restores holistic image structures with a powerful attention-based transformer model in a fixed low-resolution sketch space.
Our model can be integrated with other pretrained inpainting models efficiently with the zero-d residual addition.
arXiv Detail & Related papers (2022-03-02T04:27:27Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z) - Exploiting Deep Generative Prior for Versatile Image Restoration and
Manipulation [181.08127307338654]
This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images.
The deep generative prior (DGP) provides compelling results to restore missing semantics, e.g., color, patch, resolution, of various degraded images.
arXiv Detail & Related papers (2020-03-30T17:45:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.