Outpainting by Queries
- URL: http://arxiv.org/abs/2207.05312v1
- Date: Tue, 12 Jul 2022 04:48:41 GMT
- Title: Outpainting by Queries
- Authors: Kai Yao, Penglei Gao, Xi Yang, Kaizhu Huang, Jie Sun, and Rui Zhang
- Abstract summary: We propose a novel hybrid vision-transformer-based encoder-decoder framework, named textbfQuery textbfOutpainting textbfTRansformer (textbfQueryOTR)
We experimentally show that QueryOTR could generate visually appealing results smoothly and realistically against the state-of-the-art image outpainting approaches.
- Score: 23.626012684754965
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Image outpainting, which is well studied with Convolution Neural Network
(CNN) based framework, has recently drawn more attention in computer vision.
However, CNNs rely on inherent inductive biases to achieve effective sample
learning, which may degrade the performance ceiling. In this paper, motivated
by the flexible self-attention mechanism with minimal inductive biases in
transformer architecture, we reframe the generalised image outpainting problem
as a patch-wise sequence-to-sequence autoregression problem, enabling
query-based image outpainting. Specifically, we propose a novel hybrid
vision-transformer-based encoder-decoder framework, named \textbf{Query}
\textbf{O}utpainting \textbf{TR}ansformer (\textbf{QueryOTR}), for
extrapolating visual context all-side around a given image. Patch-wise mode's
global modeling capacity allows us to extrapolate images from the attention
mechanism's query standpoint. A novel Query Expansion Module (QEM) is designed
to integrate information from the predicted queries based on the encoder's
output, hence accelerating the convergence of the pure transformer even with a
relatively small dataset. To further enhance connectivity between each patch,
the proposed Patch Smoothing Module (PSM) re-allocates and averages the
overlapped regions, thus providing seamless predicted images. We experimentally
show that QueryOTR could generate visually appealing results smoothly and
realistically against the state-of-the-art image outpainting approaches.
Related papers
- Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting [2.3014300466616078]
This paper diverges from vision transformers by using a computationally-efficient WaveMix-based fully convolutional architecture -- WavePaint.
It uses a 2D-discrete wavelet transform (DWT) for spatial and multi-resolution token-mixing along with convolutional layers.
Our model even outperforms current GAN-based architectures in CelebA-HQ dataset without using an adversarially trainable discriminator.
arXiv Detail & Related papers (2023-07-01T18:41:34Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - High-Fidelity Image Inpainting with GAN Inversion [23.49170140410603]
In this paper, we propose a novel GAN inversion model for image inpainting, dubbed InvertFill.
Within the encoder, the pre-modulation network leverages multi-scale structures to encode more discriminative semantics into style vectors.
To reconstruct faithful and photorealistic images, a simple yet effective Soft-update Mean Latent module is designed to capture more diverse in-domain patterns that synthesize high-fidelity textures for large corruptions.
arXiv Detail & Related papers (2022-08-25T03:39:24Z) - Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks.
After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z) - MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions.
Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z) - The Devil Is in the Details: Window-based Attention for Image
Compression [58.1577742463617]
Most existing learned image compression models are based on Convolutional Neural Networks (CNNs)
In this paper, we study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block.
The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models.
arXiv Detail & Related papers (2022-03-16T07:55:49Z) - Generalised Image Outpainting with U-Transformer [19.894445491176878]
We develop a novel transformer-based generative adversarial network called U-Transformer.
Specifically, we design a generator as an encoder-to-decoder structure embedded with the popular Swin Transformer blocks.
We experimentally demonstrate that our proposed method could produce visually appealing results for generalised image outpainting.
arXiv Detail & Related papers (2022-01-27T09:41:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.