Multimodal Conditional Image Synthesis with Product-of-Experts GANs
- URL: http://arxiv.org/abs/2112.05130v1
- Date: Thu, 9 Dec 2021 18:59:00 GMT
- Title: Multimodal Conditional Image Synthesis with Product-of-Experts GANs
- Authors: Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu
- Abstract summary: PoE-GAN is a product-of-experts generator and a multimodal projection discriminator.
It learns to synthesize images with high quality and diversity.
It also outperforms the best existing unimodal conditional image synthesis approaches when tested in the unimodal setting.
- Score: 41.45898101314992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing conditional image synthesis frameworks generate images based on user
inputs in a single modality, such as text, segmentation, sketch, or style
reference. They are often unable to leverage multimodal user inputs when
available, which reduces their practicality. To address this limitation, we
propose the Product-of-Experts Generative Adversarial Networks (PoE-GAN)
framework, which can synthesize images conditioned on multiple input modalities
or any subset of them, even the empty set. PoE-GAN consists of a
product-of-experts generator and a multimodal multiscale projection
discriminator. Through our carefully designed training scheme, PoE-GAN learns
to synthesize images with high quality and diversity. Besides advancing the
state of the art in multimodal conditional image synthesis, PoE-GAN also
outperforms the best existing unimodal conditional image synthesis approaches
when tested in the unimodal setting. The project website is available at
https://deepimagination.github.io/PoE-GAN .
Related papers
- Many-to-many Image Generation with Auto-regressive Diffusion Models [59.5041405824704]
This paper introduces a domain-general framework for many-to-many image generation, capable of producing interrelated image series from a given set of images.
We present MIS, a novel large-scale multi-image dataset, containing 12M synthetic multi-image samples, each with 25 interconnected images.
We learn M2M, an autoregressive model for many-to-many generation, where each image is modeled within a diffusion framework.
arXiv Detail & Related papers (2024-04-03T23:20:40Z) - UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion [36.06457895469353]
UNIMO-G is a conditional diffusion framework that operates on multimodal prompts with interleaved textual and visual inputs.
It excels in both text-to-image generation and zero-shot subject-driven synthesis.
arXiv Detail & Related papers (2024-01-24T11:36:44Z) - Unified Brain MR-Ultrasound Synthesis using Multi-Modal Hierarchical
Representations [34.821129614819604]
We introduce MHVAE, a deep hierarchical variational auto-encoder (VAE) that synthesizes missing images from various modalities.
Extending multi-modal VAEs with a hierarchical latent structure, we introduce a probabilistic formulation for fusing multi-modal images in a common latent representation.
Our model outperformed multi-modal VAEs, conditional GANs, and the current state-of-the-art unified method (ResViT) for missing images.
arXiv Detail & Related papers (2023-09-15T20:21:03Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Cascading Modular Network (CAM-Net) for Multimodal Image Synthesis [7.726465518306907]
A persistent challenge has been to generate diverse versions of output images from the same input image.
We propose CAM-Net, a unified architecture that can be applied to a broad range of tasks.
It is capable of generating convincing high frequency details, achieving a reduction of the Frechet Inception Distance (FID) by up to 45.3% compared to the baseline.
arXiv Detail & Related papers (2021-06-16T17:58:13Z) - IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images.
We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations.
IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z) - Multimodal Face Synthesis from Visual Attributes [85.87796260802223]
We propose a novel generative adversarial network that simultaneously synthesizes identity preserving multimodal face images.
multimodal stretch-in modules are introduced in the discriminator which discriminates between real and fake images.
arXiv Detail & Related papers (2021-04-09T13:47:23Z) - Multimodal Image Synthesis with Conditional Implicit Maximum Likelihood
Estimation [54.17177006826262]
We develop a new generic conditional image synthesis method based on Implicit Maximum Likelihood Estimation (IMLE)
We demonstrate improved multimodal image synthesis performance on two tasks, single image super-resolution and image synthesis from scene layouts.
arXiv Detail & Related papers (2020-04-07T03:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.