Related papers: USIS: Unsupervised Semantic Image Synthesis

USIS: Unsupervised Semantic Image Synthesis

URL: http://arxiv.org/abs/2109.14715v1
Date: Wed, 29 Sep 2021 20:48:41 GMT
Title: USIS: Unsupervised Semantic Image Synthesis
Authors: George Eskandar, Mohamed Abdelsamad, Karim Armanious, Bin Yang
Abstract summary: We propose a new Unsupervised paradigm for Semantic Image Synthesis (USIS) USIS learns to output images with visually separable semantic classes using a self-supervised segmentation loss. In order to match the color and texture distribution of real images without losing high-frequency information, we propose to use whole image wavelet-based discrimination.
Score: 9.613134538472801
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic Image Synthesis (SIS) is a subclass of image-to-image translation where a photorealistic image is synthesized from a segmentation mask. SIS has mostly been addressed as a supervised problem. However, state-of-the-art methods depend on a huge amount of labeled data and cannot be applied in an unpaired setting. On the other hand, generic unpaired image-to-image translation frameworks underperform in comparison, because they color-code semantic layouts and feed them to traditional convolutional networks, which then learn correspondences in appearance instead of semantic content. In this initial work, we propose a new Unsupervised paradigm for Semantic Image Synthesis (USIS) as a first step towards closing the performance gap between paired and unpaired settings. Notably, the framework deploys a SPADE generator that learns to output images with visually separable semantic classes using a self-supervised segmentation loss. Furthermore, in order to match the color and texture distribution of real images without losing high-frequency information, we propose to use whole image wavelet-based discrimination. We test our methodology on 3 challenging datasets and demonstrate its ability to generate multimodal photorealistic images with an improved quality in the unpaired setting.

Related papers

Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning [70.98890307376548]
We propose a novel Patch-wise Cross-modal feature Mix-up (PCM) mechanism to adaptively mitigate the unfaithful contents during training. Our PCM-Net ranks first in both in-domain and cross-domain zero-shot image captioning.
arXiv Detail & Related papers (2024-12-31T13:39:08Z)
SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow [94.90853153808987]
Semantic segmentation and semantic image synthesis are representative tasks in visual perception and generation. We propose a unified framework (SemFlow) and model them as a pair of reverse problems. Experiments show that our SemFlow achieves competitive results on semantic segmentation and semantic image synthesis tasks.
arXiv Detail & Related papers (2024-05-30T17:34:40Z)
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images. Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z)
Towards Pragmatic Semantic Image Synthesis for Urban Scenes [4.36080478413575]
We present a new task: given a dataset with synthetic images and labels and a dataset with unlabeled real images, our goal is to learn a model that can generate images with the content of the input mask and the appearance of real images. We leverage the synthetic image as a guide to the content of the generated image by penalizing the difference between their high-level features on a patch level. In contrast to previous works which employ one discriminator that overfits the target domain semantic distribution, we employ a discriminator for the whole image and multiscale discriminators on the image patches.
arXiv Detail & Related papers (2023-05-16T18:01:12Z)
Wavelet-based Unsupervised Label-to-Image Translation [9.339522647331334]
We propose a new Unsupervised paradigm for SIS (USIS) that makes use of a self-supervised segmentation loss and whole image wavelet based discrimination. We test our methodology on 3 challenging datasets and demonstrate its ability to bridge the performance gap between paired and unpaired models.
arXiv Detail & Related papers (2023-05-16T17:48:44Z)
Cap2Aug: Caption guided Image to Image data Augmentation [41.53127698828463]
Cap2Aug is an image-to-image diffusion model-based data augmentation strategy using image captions as text prompts. We generate captions from the limited training images and using these captions edit the training images using an image-to-image stable diffusion model. This strategy generates augmented versions of images similar to the training images yet provides semantic diversity across the samples.
arXiv Detail & Related papers (2022-12-11T04:37:43Z)
Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture. The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses. We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z)
Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z)
You Only Need Adversarial Supervision for Semantic Image Synthesis [84.83711654797342]
We propose a novel, simplified GAN model, which needs only adversarial supervision to achieve high quality results. We show that images synthesized by our model are more diverse and follow the color and texture of real images more closely.
arXiv Detail & Related papers (2020-12-08T23:00:48Z)
Seed the Views: Hierarchical Semantic Alignment for Contrastive Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation. Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.