High-Fidelity Pluralistic Image Completion with Transformers
- URL: http://arxiv.org/abs/2103.14031v1
- Date: Thu, 25 Mar 2021 17:59:46 GMT
- Title: High-Fidelity Pluralistic Image Completion with Transformers
- Authors: Ziyu Wan and Jingbo Zhang and Dongdong Chen and Jing Liao
- Abstract summary: This paper brings the best of both worlds to pluralistic image completion: appearance prior reconstruction with transformer and texture replenishment with CNN.
The proposed method vastly outperforms state-of-the-art methods in terms of three aspects.
- Score: 23.563949855476608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image completion has made tremendous progress with convolutional neural
networks (CNNs), because of their powerful texture modeling capacity. However,
due to some inherent properties (e.g., local inductive prior, spatial-invariant
kernels), CNNs do not perform well in understanding global structures or
naturally support pluralistic completion. Recently, transformers demonstrate
their power in modeling the long-term relationship and generating diverse
results, but their computation complexity is quadratic to input length, thus
hampering the application in processing high-resolution images. This paper
brings the best of both worlds to pluralistic image completion: appearance
prior reconstruction with transformer and texture replenishment with CNN. The
former transformer recovers pluralistic coherent structures together with some
coarse textures, while the latter CNN enhances the local texture details of
coarse priors guided by the high-resolution masked images. The proposed method
vastly outperforms state-of-the-art methods in terms of three aspects: 1) large
performance boost on image fidelity even compared to deterministic completion
methods; 2) better diversity and higher fidelity for pluralistic completion; 3)
exceptional generalization ability on large masks and generic dataset, like
ImageNet.
Related papers
- Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - CompletionFormer: Depth Completion with Convolutions and Vision
Transformers [0.0]
This paper proposes a Joint Convolutional Attention and Transformer block (JCAT), which deeply couples the convolutional attention layer and Vision Transformer into one block, as the basic unit to construct our depth completion model in a pyramidal structure.
Our CompletionFormer outperforms state-of-the-art CNNs-based methods on the outdoor KITTI Depth Completion benchmark and indoor NYUv2 dataset, achieving significantly higher efficiency (nearly 1/3 FLOPs) compared to pure Transformer-based methods.
arXiv Detail & Related papers (2023-04-25T17:59:47Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z) - Diverse Image Inpainting with Bidirectional and Autoregressive
Transformers [55.21000775547243]
We propose BAT-Fill, an image inpainting framework with a novel bidirectional autoregressive transformer (BAT)
BAT-Fill inherits the merits of transformers and CNNs in a two-stage manner, which allows to generate high-resolution contents without being constrained by the quadratic complexity of attention in transformers.
arXiv Detail & Related papers (2021-04-26T03:52:27Z) - Spatially-Adaptive Pixelwise Networks for Fast Image Translation [57.359250882770525]
We introduce a new generator architecture, aimed at fast and efficient high-resolution image-to-image translation.
We use pixel-wise networks; that is, each pixel is processed independently of others.
Our model is up to 18x faster than state-of-the-art baselines.
arXiv Detail & Related papers (2020-12-05T10:02:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.