CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image
Synthesis
- URL: http://arxiv.org/abs/2211.14286v1
- Date: Fri, 25 Nov 2022 18:41:44 GMT
- Title: CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image
Synthesis
- Authors: Shichong Peng, Alireza Moazeni, Ke Li
- Abstract summary: A persistent challenge in conditional image synthesis has been to generate diverse output images from the same input image.
We leverage Implicit Conditional Likelihood Estimation Maximum (IMLE) which can overcome mode collapse.
To generate high-fidelity images, prior IMLE-based methods require a large number of samples, which is expensive.
We show CHIMLE significantly outperforms the prior best IMLE, GAN and diffusion-based methods in terms of image fidelity and mode coverage.
- Score: 5.7789164588489035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A persistent challenge in conditional image synthesis has been to generate
diverse output images from the same input image despite only one output image
being observed per input image. GAN-based methods are prone to mode collapse,
which leads to low diversity. To get around this, we leverage Implicit Maximum
Likelihood Estimation (IMLE) which can overcome mode collapse fundamentally.
IMLE uses the same generator as GANs but trains it with a different,
non-adversarial objective which ensures each observed image has a generated
sample nearby. Unfortunately, to generate high-fidelity images, prior
IMLE-based methods require a large number of samples, which is expensive. In
this paper, we propose a new method to get around this limitation, which we dub
Conditional Hierarchical IMLE (CHIMLE), which can generate high-fidelity images
without requiring many samples. We show CHIMLE significantly outperforms the
prior best IMLE, GAN and diffusion-based methods in terms of image fidelity and
mode coverage across four tasks, namely night-to-day, 16x single image
super-resolution, image colourization and image decompression. Quantitatively,
our method improves Fr\'echet Inception Distance (FID) by 36.9% on average
compared to the prior best IMLE-based method, and by 27.5% on average compared
to the best non-IMLE-based general-purpose methods.
Related papers
- A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - Provably Robust Score-Based Diffusion Posterior Sampling for Plug-and-Play Image Reconstruction [31.503662384666274]
In science and engineering, the goal is to infer an unknown image from a small number of measurements collected from a known forward model describing certain imaging modality.
Motivated Score-based diffusion models, due to its empirical success, have emerged as an impressive candidate of an exemplary prior in image reconstruction.
arXiv Detail & Related papers (2024-03-25T15:58:26Z) - Improving Denoising Diffusion Probabilistic Models via Exploiting Shared
Representations [5.517338199249029]
SR-DDPM is a class of generative models that produce high-quality images by reversing a noisy diffusion process.
By exploiting the similarity between diverse data distributions, our method can scale to multiple tasks without compromising the image quality.
We evaluate our method on standard image datasets and show that it outperforms both unconditional and conditional DDPM in terms of FID and SSIM metrics.
arXiv Detail & Related papers (2023-11-27T22:30:26Z) - A Novel Truncated Norm Regularization Method for Multi-channel Color
Image Denoising [5.624787484101139]
This paper is proposed to denoise color images with a double-weighted truncated nuclear norm minus truncated Frobenius norm minimization (DtNFM) method.
Through exploiting the nonlocal self-similarity of the noisy image, the similar structures are gathered and a series of similar patch matrices are constructed.
Experiments on synthetic and real noise datasets demonstrate that the proposed method outperforms many state-of-the-art color image denoising methods.
arXiv Detail & Related papers (2023-07-16T03:40:35Z) - DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion [144.9653045465908]
We propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM)
Our approach yields promising fusion results in infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2023-03-13T04:06:42Z) - DEff-GAN: Diverse Attribute Transfer for Few-Shot Image Synthesis [0.38073142980733]
We extend the single-image GAN method to model multiple images for sample synthesis.
Our Data-Efficient GAN (DEff-GAN) generates excellent results when similarities and correspondences can be drawn between the input images or classes.
arXiv Detail & Related papers (2023-02-28T12:43:52Z) - CDPMSR: Conditional Diffusion Probabilistic Models for Single Image
Super-Resolution [91.56337748920662]
Diffusion probabilistic models (DPM) have been widely adopted in image-to-image translation.
We propose a simple but non-trivial DPM-based super-resolution post-process framework,i.e., cDPMSR.
Our method surpasses prior attempts on both qualitative and quantitative results.
arXiv Detail & Related papers (2023-02-14T15:13:33Z) - Image Generation with Multimodal Priors using Denoising Diffusion
Probabilistic Models [54.1843419649895]
A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities and corresponding outputs.
We propose a solution based on a denoising diffusion probabilistic synthesis models to generate images under multi-model priors.
arXiv Detail & Related papers (2022-06-10T12:23:05Z) - MIRST-DM: Multi-Instance RST with Drop-Max Layer for Robust
Classification of Breast Cancer [62.997667081978825]
We propose the Multi-instance RST with a drop-max layer, namely MIRST-DM, to learn smoother decision boundaries on small datasets.
The proposed approach was validated using a small breast ultrasound dataset with 1,190 images.
arXiv Detail & Related papers (2022-05-02T20:25:26Z) - Cascading Modular Network (CAM-Net) for Multimodal Image Synthesis [7.726465518306907]
A persistent challenge has been to generate diverse versions of output images from the same input image.
We propose CAM-Net, a unified architecture that can be applied to a broad range of tasks.
It is capable of generating convincing high frequency details, achieving a reduction of the Frechet Inception Distance (FID) by up to 45.3% compared to the baseline.
arXiv Detail & Related papers (2021-06-16T17:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.