Cascading Modular Network (CAM-Net) for Multimodal Image Synthesis
- URL: http://arxiv.org/abs/2106.09015v1
- Date: Wed, 16 Jun 2021 17:58:13 GMT
- Title: Cascading Modular Network (CAM-Net) for Multimodal Image Synthesis
- Authors: Shichong Peng, Alireza Moazeni, Ke Li
- Abstract summary: A persistent challenge has been to generate diverse versions of output images from the same input image.
We propose CAM-Net, a unified architecture that can be applied to a broad range of tasks.
It is capable of generating convincing high frequency details, achieving a reduction of the Frechet Inception Distance (FID) by up to 45.3% compared to the baseline.
- Score: 7.726465518306907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models such as GANs have driven impressive advances in
conditional image synthesis in recent years. A persistent challenge has been to
generate diverse versions of output images from the same input image, due to
the problem of mode collapse: because only one ground truth output image is
given per input image, only one mode of the conditional distribution is
modelled. In this paper, we focus on this problem of multimodal conditional
image synthesis and build on the recently proposed technique of Implicit
Maximum Likelihood Estimation (IMLE). Prior IMLE-based methods required
different architectures for different tasks, which limit their applicability,
and were lacking in fine details in the generated images. We propose CAM-Net, a
unified architecture that can be applied to a broad range of tasks.
Additionally, it is capable of generating convincing high frequency details,
achieving a reduction of the Frechet Inception Distance (FID) by up to 45.3%
compared to the baseline.
Related papers
- A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image
Synthesis [5.7789164588489035]
A persistent challenge in conditional image synthesis has been to generate diverse output images from the same input image.
We leverage Implicit Conditional Likelihood Estimation Maximum (IMLE) which can overcome mode collapse.
To generate high-fidelity images, prior IMLE-based methods require a large number of samples, which is expensive.
We show CHIMLE significantly outperforms the prior best IMLE, GAN and diffusion-based methods in terms of image fidelity and mode coverage.
arXiv Detail & Related papers (2022-11-25T18:41:44Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Image Generation with Multimodal Priors using Denoising Diffusion
Probabilistic Models [54.1843419649895]
A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities and corresponding outputs.
We propose a solution based on a denoising diffusion probabilistic synthesis models to generate images under multi-model priors.
arXiv Detail & Related papers (2022-06-10T12:23:05Z) - Large Scale Image Completion via Co-Modulated Generative Adversarial
Networks [18.312552957727828]
We propose a generic new approach that bridges the gap between image-conditional and recent unconditional generative architectures.
Also, due to the lack of good quantitative metrics for image completion, we propose the new Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS)
Experiments demonstrate superior performance in terms of both quality and diversity over state-of-the-art methods in free-form image completion and easy generalization to image-to-image translation.
arXiv Detail & Related papers (2021-03-18T17:59:11Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z) - Multimodal Image Synthesis with Conditional Implicit Maximum Likelihood
Estimation [54.17177006826262]
We develop a new generic conditional image synthesis method based on Implicit Maximum Likelihood Estimation (IMLE)
We demonstrate improved multimodal image synthesis performance on two tasks, single image super-resolution and image synthesis from scene layouts.
arXiv Detail & Related papers (2020-04-07T03:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.