Generating Diverse Structure for Image Inpainting With Hierarchical
VQ-VAE
- URL: http://arxiv.org/abs/2103.10022v1
- Date: Thu, 18 Mar 2021 05:10:49 GMT
- Title: Generating Diverse Structure for Image Inpainting With Hierarchical
VQ-VAE
- Authors: Jialun Peng, Dong Liu, Songcen Xu, Houqiang Li
- Abstract summary: We propose a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture.
Experimental results on CelebA-HQ, Places2, and ImageNet datasets show that our method not only enhances the diversity of the inpainting solutions but also improves the visual quality of the generated multiple images.
- Score: 74.29384873537587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given an incomplete image without additional constraint, image inpainting
natively allows for multiple solutions as long as they appear plausible.
Recently, multiplesolution inpainting methods have been proposed and shown the
potential of generating diverse results. However, these methods have difficulty
in ensuring the quality of each solution, e.g. they produce distorted structure
and/or blurry texture. We propose a two-stage model for diverse inpainting,
where the first stage generates multiple coarse results each of which has a
different structure, and the second stage refines each coarse result separately
by augmenting texture. The proposed model is inspired by the hierarchical
vector quantized variational auto-encoder (VQ-VAE), whose hierarchical
architecture isentangles structural and textural information. In addition, the
vector quantization in VQVAE enables autoregressive modeling of the discrete
distribution over the structural information. Sampling from the distribution
can easily generate diverse and high-quality structures, making up the first
stage of our model. In the second stage, we propose a structural attention
module inside the texture generation network, where the module utilizes the
structural information to capture distant correlations. We further reuse the
VQ-VAE to calculate two feature losses, which help improve structure coherence
and texture realism, respectively. Experimental results on CelebA-HQ, Places2,
and ImageNet datasets show that our method not only enhances the diversity of
the inpainting solutions but also improves the visual quality of the generated
multiple images. Code and models are available at:
https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting.
Related papers
- Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture.
The proposed model is trained separately to map text embeddings to image embeddings of CLIP.
We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z) - A Structure-Guided Diffusion Model for Large-Hole Image Completion [85.61681358977266]
We develop a structure-guided diffusion model to fill large holes in images.
Our method achieves a superior or comparable visual quality compared to state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-18T18:59:01Z) - Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand [28.32208483559088]
We claim that the performance of inpainting algorithms can be better judged by the generated structures and textures.
In this paper, we propose a novel inpainting network combining the advantages of the two designs.
Our model achieves a remarkable visual quality to match state-of-the-art performance in both structure generation and repeating texture synthesis.
arXiv Detail & Related papers (2022-08-05T20:42:13Z) - FewGAN: Generating from the Joint Distribution of a Few Images [95.6635227371479]
We introduce FewGAN, a generative model for generating novel, high-quality and diverse images.
FewGAN is a hierarchical patch-GAN that applies quantization at the first coarse scale, followed by a pyramid of residual fully convolutional GANs at finer scales.
In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-07-18T07:11:28Z) - Image Inpainting via Conditional Texture and Structure Dual Generation [26.97159780261334]
We propose a novel two-stream network for image inpainting, which models the structure-constrained texture synthesis and texture-guided structure reconstruction.
To enhance the global consistency, a Bi-directional Gated Feature Fusion (Bi-GFF) module is designed to exchange and combine the structure and texture information.
Experiments on the CelebA, Paris StreetView and Places2 datasets demonstrate the superiority of the proposed method.
arXiv Detail & Related papers (2021-08-22T15:44:37Z) - InfinityGAN: Towards Infinite-Resolution Image Synthesis [92.40782797030977]
We present InfinityGAN, a method to generate arbitrary-resolution images.
We show how it trains and infers patch-by-patch seamlessly with low computational resources.
arXiv Detail & Related papers (2021-04-08T17:59:30Z) - Efficient and Model-Based Infrared and Visible Image Fusion Via
Algorithm Unrolling [24.83209572888164]
Infrared and visible image fusion (IVIF) expects to obtain images that retain thermal radiation information from infrared images and texture details from visible images.
A model-based convolutional neural network (CNN) model is proposed to overcome the shortcomings of traditional CNN-based IVIF models.
arXiv Detail & Related papers (2020-05-12T16:15:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.