Controllable Multi-domain Semantic Artwork Synthesis
- URL: http://arxiv.org/abs/2308.10111v1
- Date: Sat, 19 Aug 2023 21:16:28 GMT
- Title: Controllable Multi-domain Semantic Artwork Synthesis
- Authors: Yuantian Huang, Satoshi Iizuka, Edgar Simo-Serra, and Kazuhiro Fukui
- Abstract summary: We propose a dataset that contains 40,000 images of artwork from 4 different domains with their corresponding semantic label maps.
We generate the dataset by first extracting semantic maps from landscape photography.
We then propose a conditional Generative Adrial Network (GAN)-based approach to generate high-quality artwork.
- Score: 17.536225601718687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel framework for multi-domain synthesis of artwork from
semantic layouts. One of the main limitations of this challenging task is the
lack of publicly available segmentation datasets for art synthesis. To address
this problem, we propose a dataset, which we call ArtSem, that contains 40,000
images of artwork from 4 different domains with their corresponding semantic
label maps. We generate the dataset by first extracting semantic maps from
landscape photography and then propose a conditional Generative Adversarial
Network (GAN)-based approach to generate high-quality artwork from the semantic
maps without necessitating paired training data. Furthermore, we propose an
artwork synthesis model that uses domain-dependent variational encoders for
high-quality multi-domain synthesis. The model is improved and complemented
with a simple but effective normalization method, based on normalizing both the
semantic and style jointly, which we call Spatially STyle-Adaptive
Normalization (SSTAN). In contrast to previous methods that only take semantic
layout as input, our model is able to learn a joint representation of both
style and semantic information, which leads to better generation quality for
synthesizing artistic images. Results indicate that our model learns to
separate the domains in the latent space, and thus, by identifying the
hyperplanes that separate the different domains, we can also perform
fine-grained control of the synthesized artwork. By combining our proposed
dataset and approach, we are able to generate user-controllable artwork that is
of higher quality than existing
Related papers
- Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic
Image Synthesis [139.2216271759332]
We propose a novel ECGAN for the challenging semantic image synthesis task.
The semantic labels do not provide detailed structural information, making it challenging to synthesize local details and structures.
The widely adopted CNN operations such as convolution, down-sampling, and normalization usually cause spatial resolution loss.
We propose a novel contrastive learning method, which aims to enforce pixel embeddings belonging to the same semantic class to generate more similar image content.
arXiv Detail & Related papers (2023-07-22T14:17:19Z) - Few-shot Semantic Image Synthesis with Class Affinity Transfer [23.471210664024067]
We propose a transfer method that leverages a model trained on a large source dataset to improve the learning ability on small target datasets.
The class affinity matrix is introduced as a first layer to the source model to make it compatible with the target label maps.
We apply our approach to GAN-based and diffusion-based architectures for semantic synthesis.
arXiv Detail & Related papers (2023-04-05T09:24:45Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Retrieval-based Spatially Adaptive Normalization for Semantic Image
Synthesis [68.1281982092765]
We propose a novel normalization module, termed as REtrieval-based Spatially AdaptIve normaLization (RESAIL)
RESAIL provides pixel level fine-grained guidance to the normalization architecture.
Experiments on several challenging datasets show that our RESAIL performs favorably against state-of-the-arts in terms of quantitative metrics, visual quality, and subjective evaluation.
arXiv Detail & Related papers (2022-04-06T14:21:39Z) - Example-Guided Image Synthesis across Arbitrary Scenes using Masked
Spatial-Channel Attention and Self-Supervision [83.33283892171562]
Example-guided image synthesis has recently been attempted to synthesize an image from a semantic label map and an exemplary image.
In this paper, we tackle a more challenging and general task, where the exemplar is an arbitrary scene image that is semantically different from the given label map.
We propose an end-to-end network for joint global and local feature alignment and synthesis.
arXiv Detail & Related papers (2020-04-18T18:17:40Z) - Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis [194.1452124186117]
We propose a novel ECGAN for the challenging semantic image synthesis task.
Our ECGAN achieves significantly better results than state-of-the-art methods.
arXiv Detail & Related papers (2020-03-31T01:23:21Z) - Learning Texture Invariant Representation for Domain Adaptation of
Semantic Segmentation [19.617821473205694]
It is challenging for a model trained with synthetic data to generalize to real data.
We diversity the texture of synthetic images using a style transfer algorithm.
We fine-tune the model with self-training to get direct supervision of the target texture.
arXiv Detail & Related papers (2020-03-02T13:11:54Z) - Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings [76.85673049332428]
Learned joint representations of images and text form the backbone of several important cross-domain tasks such as image captioning.
We propose a novel semi-supervised framework, which models shared information between domains and domain-specific information separately.
We demonstrate the effectiveness of our model on diverse tasks, including image captioning and text-to-image synthesis.
arXiv Detail & Related papers (2020-02-16T19:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.