Multimodal Image Synthesis with Conditional Implicit Maximum Likelihood
Estimation
- URL: http://arxiv.org/abs/2004.03590v1
- Date: Tue, 7 Apr 2020 03:06:55 GMT
- Title: Multimodal Image Synthesis with Conditional Implicit Maximum Likelihood
Estimation
- Authors: Ke Li, Shichong Peng, Tianhao Zhang, Jitendra Malik
- Abstract summary: We develop a new generic conditional image synthesis method based on Implicit Maximum Likelihood Estimation (IMLE)
We demonstrate improved multimodal image synthesis performance on two tasks, single image super-resolution and image synthesis from scene layouts.
- Score: 54.17177006826262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many tasks in computer vision and graphics fall within the framework of
conditional image synthesis. In recent years, generative adversarial nets
(GANs) have delivered impressive advances in quality of synthesized images.
However, it remains a challenge to generate both diverse and plausible images
for the same input, due to the problem of mode collapse. In this paper, we
develop a new generic multimodal conditional image synthesis method based on
Implicit Maximum Likelihood Estimation (IMLE) and demonstrate improved
multimodal image synthesis performance on two tasks, single image
super-resolution and image synthesis from scene layouts. We make our
implementation publicly available.
Related papers
- Many-to-many Image Generation with Auto-regressive Diffusion Models [59.5041405824704]
This paper introduces a domain-general framework for many-to-many image generation, capable of producing interrelated image series from a given set of images.
We present MIS, a novel large-scale multi-image dataset, containing 12M synthetic multi-image samples, each with 25 interconnected images.
We learn M2M, an autoregressive model for many-to-many generation, where each image is modeled within a diffusion framework.
arXiv Detail & Related papers (2024-04-03T23:20:40Z) - Unified Brain MR-Ultrasound Synthesis using Multi-Modal Hierarchical
Representations [34.821129614819604]
We introduce MHVAE, a deep hierarchical variational auto-encoder (VAE) that synthesizes missing images from various modalities.
Extending multi-modal VAEs with a hierarchical latent structure, we introduce a probabilistic formulation for fusing multi-modal images in a common latent representation.
Our model outperformed multi-modal VAEs, conditional GANs, and the current state-of-the-art unified method (ResViT) for missing images.
arXiv Detail & Related papers (2023-09-15T20:21:03Z) - SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation
for Novel View Synthesis from a Single Image [60.52991173059486]
We introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image.
Our method demonstrates considerable performance gains in large-scale unbounded outdoor scenes using a single image on the KITTI dataset.
arXiv Detail & Related papers (2023-09-12T15:33:09Z) - MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal
Conditional Image Synthesis [73.08923361242925]
We propose to generate images conditioned on the compositions of multimodal control signals.
We introduce a Mixture-of-Modality-Tokens Transformer (MMoT) that adaptively fuses fine-grained multimodal control signals.
arXiv Detail & Related papers (2023-05-10T09:00:04Z) - Less is More: Unsupervised Mask-guided Annotated CT Image Synthesis with
Minimum Manual Segmentations [2.1785903900600316]
We propose a novel strategy for medical image synthesis, namely Unsupervised Mask (UM)-guided synthesis.
UM-guided synthesis provided high-quality synthetic images with significantly higher fidelity, variety, and utility.
arXiv Detail & Related papers (2023-03-19T20:30:35Z) - Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis [77.23998762763078]
We present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis.
Our model decomposes an input image into scale-dependent vector quantized features, followed by a coarse-to-fine gating for producing image output.
We conduct extensive experiments over various unconditioned and conditional image generation tasks, ranging from text-to-image synthesis, layout-to-image, scene-graph-to-image, to label-to-image.
arXiv Detail & Related papers (2022-08-29T17:37:29Z) - Multimodal Conditional Image Synthesis with Product-of-Experts GANs [41.45898101314992]
PoE-GAN is a product-of-experts generator and a multimodal projection discriminator.
It learns to synthesize images with high quality and diversity.
It also outperforms the best existing unimodal conditional image synthesis approaches when tested in the unimodal setting.
arXiv Detail & Related papers (2021-12-09T18:59:00Z) - Cascading Modular Network (CAM-Net) for Multimodal Image Synthesis [7.726465518306907]
A persistent challenge has been to generate diverse versions of output images from the same input image.
We propose CAM-Net, a unified architecture that can be applied to a broad range of tasks.
It is capable of generating convincing high frequency details, achieving a reduction of the Frechet Inception Distance (FID) by up to 45.3% compared to the baseline.
arXiv Detail & Related papers (2021-06-16T17:58:13Z) - Multimodal Face Synthesis from Visual Attributes [85.87796260802223]
We propose a novel generative adversarial network that simultaneously synthesizes identity preserving multimodal face images.
multimodal stretch-in modules are introduced in the discriminator which discriminates between real and fake images.
arXiv Detail & Related papers (2021-04-09T13:47:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.