Towards Pragmatic Semantic Image Synthesis for Urban Scenes
- URL: http://arxiv.org/abs/2305.09726v1
- Date: Tue, 16 May 2023 18:01:12 GMT
- Title: Towards Pragmatic Semantic Image Synthesis for Urban Scenes
- Authors: George Eskandar, Diandian Guo, Karim Guirguis, Bin Yang
- Abstract summary: We present a new task: given a dataset with synthetic images and labels and a dataset with unlabeled real images, our goal is to learn a model that can generate images with the content of the input mask and the appearance of real images.
We leverage the synthetic image as a guide to the content of the generated image by penalizing the difference between their high-level features on a patch level.
In contrast to previous works which employ one discriminator that overfits the target domain semantic distribution, we employ a discriminator for the whole image and multiscale discriminators on the image patches.
- Score: 4.36080478413575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The need for large amounts of training and validation data is a huge concern
in scaling AI algorithms for autonomous driving. Semantic Image Synthesis
(SIS), or label-to-image translation, promises to address this issue by
translating semantic layouts to images, providing a controllable generation of
photorealistic data. However, they require a large amount of paired data,
incurring extra costs. In this work, we present a new task: given a dataset
with synthetic images and labels and a dataset with unlabeled real images, our
goal is to learn a model that can generate images with the content of the input
mask and the appearance of real images. This new task reframes the well-known
unsupervised SIS task in a more practical setting, where we leverage cheaply
available synthetic data from a driving simulator to learn how to generate
photorealistic images of urban scenes. This stands in contrast to previous
works, which assume that labels and images come from the same domain but are
unpaired during training. We find that previous unsupervised works underperform
on this task, as they do not handle distribution shifts between two different
domains. To bypass these problems, we propose a novel framework with two main
contributions. First, we leverage the synthetic image as a guide to the content
of the generated image by penalizing the difference between their high-level
features on a patch level. Second, in contrast to previous works which employ
one discriminator that overfits the target domain semantic distribution, we
employ a discriminator for the whole image and multiscale discriminators on the
image patches. Extensive comparisons on the benchmarks GTA-V $\rightarrow$
Cityscapes and GTA-V $\rightarrow$ Mapillary show the superior performance of
the proposed model against state-of-the-art on this task.
Related papers
- SynCDR : Training Cross Domain Retrieval Models with Synthetic Data [69.26882668598587]
In cross-domain retrieval, a model is required to identify images from the same semantic category across two visual domains.
We show how to generate synthetic data to fill in these missing category examples across domains.
Our best SynCDR model can outperform prior art by up to 15%.
arXiv Detail & Related papers (2023-12-31T08:06:53Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - A Semi-Paired Approach For Label-to-Image Translation [6.888253564585197]
We introduce the first semi-supervised (semi-paired) framework for label-to-image translation.
In the semi-paired setting, the model has access to a small set of paired data and a larger set of unpaired images and labels.
We propose a training algorithm for this shared network, and we present a rare classes sampling algorithm to focus on under-represented classes.
arXiv Detail & Related papers (2023-06-23T16:13:43Z) - Wavelet-based Unsupervised Label-to-Image Translation [9.339522647331334]
We propose a new Unsupervised paradigm for SIS (USIS) that makes use of a self-supervised segmentation loss and whole image wavelet based discrimination.
We test our methodology on 3 challenging datasets and demonstrate its ability to bridge the performance gap between paired and unpaired models.
arXiv Detail & Related papers (2023-05-16T17:48:44Z) - A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture.
The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses.
We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z) - USIS: Unsupervised Semantic Image Synthesis [9.613134538472801]
We propose a new Unsupervised paradigm for Semantic Image Synthesis (USIS)
USIS learns to output images with visually separable semantic classes using a self-supervised segmentation loss.
In order to match the color and texture distribution of real images without losing high-frequency information, we propose to use whole image wavelet-based discrimination.
arXiv Detail & Related papers (2021-09-29T20:48:41Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - You Only Need Adversarial Supervision for Semantic Image Synthesis [84.83711654797342]
We propose a novel, simplified GAN model, which needs only adversarial supervision to achieve high quality results.
We show that images synthesized by our model are more diverse and follow the color and texture of real images more closely.
arXiv Detail & Related papers (2020-12-08T23:00:48Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z) - Semi-supervised Learning for Few-shot Image-to-Image Translation [89.48165936436183]
We propose a semi-supervised method for few-shot image translation, called SEMIT.
Our method achieves excellent results on four different datasets using as little as 10% of the source labels.
arXiv Detail & Related papers (2020-03-30T22:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.