Interactive Image Synthesis with Panoptic Layout Generation
- URL: http://arxiv.org/abs/2203.02104v1
- Date: Fri, 4 Mar 2022 02:45:27 GMT
- Title: Interactive Image Synthesis with Panoptic Layout Generation
- Authors: Bo Wang, Tao Wu, Minfeng Zhu, Peng Du
- Abstract summary: We propose Panoptic Layout Generative Adversarial Networks (PLGAN) to address this challenge.
PLGAN employs panoptic theory which distinguishes object categories between "stuff" with amorphous boundaries and "things" with well-defined shapes.
We experimentally compare our PLGAN with state-of-the-art layout-based models on the COCO-Stuff, Visual Genome, and Landscape datasets.
- Score: 14.1026819862002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interactive image synthesis from user-guided input is a challenging task when
users wish to control the scene structure of a generated image with
ease.Although remarkable progress has been made on layout-based image synthesis
approaches, in order to get realistic fake image in interactive scene, existing
methods require high-precision inputs, which probably need adjustment several
times and are unfriendly to novice users. When placement of bounding boxes is
subject to perturbation, layout-based models suffer from "missing regions" in
the constructed semantic layouts and hence undesirable artifacts in the
generated images. In this work, we propose Panoptic Layout Generative
Adversarial Networks (PLGAN) to address this challenge. The PLGAN employs
panoptic theory which distinguishes object categories between "stuff" with
amorphous boundaries and "things" with well-defined shapes, such that stuff and
instance layouts are constructed through separate branches and later fused into
panoptic layouts. In particular, the stuff layouts can take amorphous shapes
and fill up the missing regions left out by the instance layouts. We
experimentally compare our PLGAN with state-of-the-art layout-based models on
the COCO-Stuff, Visual Genome, and Landscape datasets. The advantages of PLGAN
are not only visually demonstrated but quantitatively verified in terms of
inception score, Fr\'echet inception distance, classification accuracy score,
and coverage.
Related papers
- LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis [24.925757148750684]
We propose a training-free approach for layout-to-image Synthesis that excels in producing high-quality images aligned with both textual prompts and layout instructions.
LoCo seamlessly integrates into existing text-to-image and layout-to-image models, enhancing their performance in spatial control and addressing semantic failures observed in prior methods.
arXiv Detail & Related papers (2023-11-21T04:28:12Z) - LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts [107.11267074981905]
We propose a semantically controllable layout-AWare diffusion model, termed LAW-Diffusion.
We show that LAW-Diffusion yields the state-of-the-art generative performance, especially with coherent object relations.
arXiv Detail & Related papers (2023-08-13T08:06:18Z) - PosterLayout: A New Benchmark and Approach for Content-aware
Visual-Textual Presentation Layout [62.12447593298437]
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements.
We propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers.
A novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts.
arXiv Detail & Related papers (2023-03-28T12:48:36Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
Pre-Training [112.94542676251133]
We propose to learn scene graph embeddings by directly optimizing their alignment with images.
Specifically, we pre-train an encoder to extract both global and local information from scene graphs.
The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections.
arXiv Detail & Related papers (2022-11-21T01:11:19Z) - Geometry Aligned Variational Transformer for Image-conditioned Layout
Generation [38.747175229902396]
We propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image.
First, self-attention mechanism is adopted to model the contextual relationship within layout elements, while cross-attention mechanism is used to fuse the visual information of conditional images.
We construct a large-scale advertisement poster layout designing dataset with delicate layout and saliency map annotations.
arXiv Detail & Related papers (2022-09-02T07:19:12Z) - Composition-aware Graphic Layout GAN for Visual-textual Presentation
Designs [24.29890251913182]
We study the graphic layout generation problem of producing high-quality visual-textual presentation designs for given images.
We propose a deep generative model, dubbed as composition-aware graphic layout GAN (CGL-GAN), to synthesize layouts based on the global and spatial visual contents of input images.
arXiv Detail & Related papers (2022-04-30T16:42:13Z) - Semantic Palette: Guiding Scene Generation with Class Proportions [34.746963256847145]
We introduce a conditional framework with novel architecture designs and learning objectives, which effectively accommodates class proportions to guide the scene generation process.
Thanks to the semantic control, we can produce layouts close to the real distribution, helping enhance the whole scene generation process.
We demonstrate the merit of our approach for data augmentation: semantic segmenters trained on real layout-image pairs outperform models only trained on real pairs.
arXiv Detail & Related papers (2021-06-03T07:04:00Z) - Person-in-Context Synthesiswith Compositional Structural Space [59.129960774988284]
We propose a new problem, textbfPersons in Context Synthesis, which aims to synthesize diverse person instance(s) in consistent contexts.
The context is specified by the bounding box object layout which lacks shape information, while pose of the person(s) by keypoints which are sparsely annotated.
To handle the stark difference in input structures, we proposed two separate neural branches to attentively composite the respective (context/person) inputs into shared compositional structural space''
This structural space is then decoded to the image space using multi-level feature modulation strategy, and learned in a self
arXiv Detail & Related papers (2020-08-28T14:33:28Z) - Learning Layout and Style Reconfigurable GANs for Controllable Image
Synthesis [12.449076001538552]
This paper focuses on a recent emerged task, layout-to-image, to learn generative models capable of synthesizing photo-realistic images from spatial layout.
Style control at the image level is the same as in vanilla GANs, while style control at the object mask level is realized by a proposed novel feature normalization scheme.
In experiments, the proposed method is tested in the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art performance obtained.
arXiv Detail & Related papers (2020-03-25T18:16:05Z) - Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed
Scenes [54.836331922449666]
We propose a Semantic Guidance and Evaluation Network (SGE-Net) to update the structural priors and the inpainted image.
It utilizes semantic segmentation map as guidance in each scale of inpainting, under which location-dependent inferences are re-evaluated.
Experiments on real-world images of mixed scenes demonstrated the superiority of our proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-15T17:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.