Aggregated Contextual Transformations for High-Resolution Image
Inpainting
- URL: http://arxiv.org/abs/2104.01431v1
- Date: Sat, 3 Apr 2021 15:50:17 GMT
- Title: Aggregated Contextual Transformations for High-Resolution Image
Inpainting
- Authors: Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo
- Abstract summary: We propose an enhanced GAN-based model, named Aggregated COntextual-Transformation GAN (AOT-GAN) for high-resolution image inpainting.
To enhance context reasoning, we construct the generator of AOT-GAN by stacking multiple layers of a proposed AOT block.
For improving texture synthesis, we enhance the discriminator of AOT-GAN by training it with a tailored mask-prediction task.
- Score: 57.241749273816374
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: State-of-the-art image inpainting approaches can suffer from generating
distorted structures and blurry textures in high-resolution images (e.g.,
512x512). The challenges mainly drive from (1) image content reasoning from
distant contexts, and (2) fine-grained texture synthesis for a large missing
region. To overcome these two challenges, we propose an enhanced GAN-based
model, named Aggregated COntextual-Transformation GAN (AOT-GAN), for
high-resolution image inpainting. Specifically, to enhance context reasoning,
we construct the generator of AOT-GAN by stacking multiple layers of a proposed
AOT block. The AOT blocks aggregate contextual transformations from various
receptive fields, allowing to capture both informative distant image contexts
and rich patterns of interest for context reasoning. For improving texture
synthesis, we enhance the discriminator of AOT-GAN by training it with a
tailored mask-prediction task. Such a training objective forces the
discriminator to distinguish the detailed appearances of real and synthesized
patches, and in turn, facilitates the generator to synthesize clear textures.
Extensive comparisons on Places2, the most challenging benchmark with 1.8
million high-resolution images of 365 complex scenes, show that our model
outperforms the state-of-the-art by a significant margin in terms of FID with
38.60% relative improvement. A user study including more than 30 subjects
further validates the superiority of AOT-GAN. We further evaluate the proposed
AOT-GAN in practical applications, e.g., logo removal, face editing, and object
removal. Results show that our model achieves promising completions in the real
world. We release code and models in
https://github.com/researchmm/AOT-GAN-for-Inpainting.
Related papers
- DivCon: Divide and Conquer for Progressive Text-to-Image Generation [0.0]
Diffusion-driven text-to-image (T2I) generation has achieved remarkable advancements.
layout is employed as an intermedium to bridge large language models and layout-based diffusion models.
We introduce a divide-and-conquer approach which decouples the T2I generation task into simple subtasks.
arXiv Detail & Related papers (2024-03-11T03:24:44Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [95.02406834386814]
Parti treats text-to-image generation as a sequence-to-sequence modeling problem.
Parti uses a Transformer-based image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens.
PartiPrompts (P2) is a new holistic benchmark of over 1600 English prompts.
arXiv Detail & Related papers (2022-06-22T01:11:29Z) - RSINet: Inpainting Remotely Sensed Images Using Triple GAN Framework [13.613245876782367]
We propose a novel inpainting method that individually focuses on each aspect of an image such as edges, colour and texture.
Each individual GAN also incorporates the attention mechanism that explicitly extracts the spectral and spatial features.
We evaluate our model, alongwith previous state of the art models, on the two well known remote sensing datasets, Open Cities AI and Earth on Canvas.
arXiv Detail & Related papers (2022-02-12T05:19:37Z) - A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture.
The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses.
We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z) - InfinityGAN: Towards Infinite-Resolution Image Synthesis [92.40782797030977]
We present InfinityGAN, a method to generate arbitrary-resolution images.
We show how it trains and infers patch-by-patch seamlessly with low computational resources.
arXiv Detail & Related papers (2021-04-08T17:59:30Z) - Generating Diverse Structure for Image Inpainting With Hierarchical
VQ-VAE [74.29384873537587]
We propose a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture.
Experimental results on CelebA-HQ, Places2, and ImageNet datasets show that our method not only enhances the diversity of the inpainting solutions but also improves the visual quality of the generated multiple images.
arXiv Detail & Related papers (2021-03-18T05:10:49Z) - Efficient texture-aware multi-GAN for image inpainting [5.33024001730262]
Recent GAN-based (Generative adversarial networks) inpainting methods show remarkable improvements.
We propose a multi-GAN architecture improving both the performance and rendering efficiency.
arXiv Detail & Related papers (2020-09-30T14:58:03Z) - DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis [80.54273334640285]
We propose a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators.
We also propose a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output.
Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images.
arXiv Detail & Related papers (2020-08-13T12:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.