Related papers: Panoptic-based Image Synthesis

Panoptic-based Image Synthesis

URL: http://arxiv.org/abs/2004.10289v1
Date: Tue, 21 Apr 2020 20:40:53 GMT
Title: Panoptic-based Image Synthesis
Authors: Aysegul Dundar, Karan Sapra, Guilin Liu, Andrew Tao, Bryan Catanzaro
Abstract summary: Conditional image synthesis serves various applications for content editing to content generation. We propose a panoptic aware image synthesis network to generate high fidelity and photorealistic images conditioned on panoptic maps.
Score: 32.82903428124024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conditional image synthesis for generating photorealistic images serves various applications for content editing to content generation. Previous conditional image synthesis algorithms mostly rely on semantic maps, and often fail in complex environments where multiple instances occlude each other. We propose a panoptic aware image synthesis network to generate high fidelity and photorealistic images conditioned on panoptic maps which unify semantic and instance information. To achieve this, we efficiently use panoptic maps in convolution and upsampling layers. We show that with the proposed changes to the generator, we can improve on the previous state-of-the-art methods by generating images in complex instance interaction environments in higher fidelity and tiny objects in more details. Furthermore, our proposed method also outperforms the previous state-of-the-art methods in metrics of mean IoU (Intersection over Union), and detAP (Detection Average Precision).

Related papers

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images. Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z)
SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image [60.52991173059486]
We introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image. Our method demonstrates considerable performance gains in large-scale unbounded outdoor scenes using a single image on the KITTI dataset.
arXiv Detail & Related papers (2023-09-12T15:33:09Z)
Dual Pyramid Generative Adversarial Networks for Semantic Image Synthesis [94.76988562653845]
The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps. Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales. We propose a Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the conditioning of spatially-adaptive normalization blocks at all scales jointly.
arXiv Detail & Related papers (2022-10-08T18:45:44Z)
Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
Retrieval-based Spatially Adaptive Normalization for Semantic Image Synthesis [68.1281982092765]
We propose a novel normalization module, termed as REtrieval-based Spatially AdaptIve normaLization (RESAIL) RESAIL provides pixel level fine-grained guidance to the normalization architecture. Experiments on several challenging datasets show that our RESAIL performs favorably against state-of-the-arts in terms of quantitative metrics, visual quality, and subjective evaluation.
arXiv Detail & Related papers (2022-04-06T14:21:39Z)
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation [52.83401421019309]
TediGAN is a framework for multi-modal image generation and manipulation with textual descriptions. StyleGAN inversion module maps real images to the latent space of a well-trained StyleGAN. visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space. instance-level optimization is for identity preservation in manipulation.
arXiv Detail & Related papers (2020-12-06T16:20:19Z)
Controllable Image Synthesis via SegVAE [89.04391680233493]
A semantic map is commonly used intermediate representation for conditional image generation. In this work, we specifically target at generating semantic maps given a label-set consisting of desired categories. The proposed framework, SegVAE, synthesizes semantic maps in an iterative manner using conditional variational autoencoder.
arXiv Detail & Related papers (2020-07-16T15:18:53Z)
Generating Annotated High-Fidelity Images Containing Multiple Coherent Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information. We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.