High-Quality Pluralistic Image Completion via Code Shared VQGAN
- URL: http://arxiv.org/abs/2204.01931v1
- Date: Tue, 5 Apr 2022 01:47:35 GMT
- Title: High-Quality Pluralistic Image Completion via Code Shared VQGAN
- Authors: Chuanxia Zheng and Guoxian Song and Tat-Jen Cham and Jianfei Cai and
Dinh Phung and Linjie Luo
- Abstract summary: We present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed.
Our framework is able to learn semantically-rich discrete codes efficiently and robustly, resulting in much better image reconstruction quality.
- Score: 51.7805154545948
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: PICNet pioneered the generation of multiple and diverse results for image
completion task, but it required a careful balance between $\mathcal{KL}$ loss
(diversity) and reconstruction loss (quality), resulting in a limited diversity
and quality . Separately, iGPT-based architecture has been employed to infer
distributions in a discrete space derived from a pixel-level pre-clustered
palette, which however cannot generate high-quality results directly. In this
work, we present a novel framework for pluralistic image completion that can
achieve both high quality and diversity at much faster inference speed. The
core of our design lies in a simple yet effective code sharing mechanism that
leads to a very compact yet expressive image representation in a discrete
latent domain. The compactness and the richness of the representation further
facilitate the subsequent deployment of a transformer to effectively learn how
to composite and complete a masked image at the discrete code domain. Based on
the global context well-captured by the transformer and the available visual
regions, we are able to sample all tokens simultaneously, which is completely
different from the prevailing autoregressive approach of iGPT-based works, and
leads to more than 100$\times$ faster inference speed. Experiments show that
our framework is able to learn semantically-rich discrete codes efficiently and
robustly, resulting in much better image reconstruction quality. Our diverse
image completion framework significantly outperforms the state-of-the-art both
quantitatively and qualitatively on multiple benchmark datasets.
Related papers
- TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual
Vision Transformer for Fast Arbitrary One-Shot Image Generation [11.207512995742999]
One-shot image generation (OSG) with generative adversarial networks that learn from the internal patches of a given image has attracted world wide attention.
We propose a novel structure-preserved method TcGAN with individual vision transformer to overcome the shortcomings of the existing one-shot image generation methods.
arXiv Detail & Related papers (2023-02-16T03:05:59Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Large Scale Image Completion via Co-Modulated Generative Adversarial
Networks [18.312552957727828]
We propose a generic new approach that bridges the gap between image-conditional and recent unconditional generative architectures.
Also, due to the lack of good quantitative metrics for image completion, we propose the new Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS)
Experiments demonstrate superior performance in terms of both quality and diversity over state-of-the-art methods in free-form image completion and easy generalization to image-to-image translation.
arXiv Detail & Related papers (2021-03-18T17:59:11Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.