Multimodal Image-to-Image Translation via Mutual Information Estimation
and Maximization
- URL: http://arxiv.org/abs/2008.03529v7
- Date: Sat, 8 May 2021 14:15:56 GMT
- Title: Multimodal Image-to-Image Translation via Mutual Information Estimation
and Maximization
- Authors: Zhiwen Zuo, Lei Zhao, Zhizhong Wang, Haibo Chen, Ailin Li, Qijiang Xu,
Wei Xing, Dongming Lu
- Abstract summary: Multimodal image-to-image translation (I2IT) aims to learn a conditional distribution that explores multiple possible images in the target domain given an input image in the source domain.
Conditional generative adversarial networks (cGANs) are often adopted for modeling such a conditional distribution.
We propose a method that explicitly estimates and maximizes the mutual information between the latent code and the output image in cGANs.
- Score: 16.54980086211836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal image-to-image translation (I2IT) aims to learn a conditional
distribution that explores multiple possible images in the target domain given
an input image in the source domain. Conditional generative adversarial
networks (cGANs) are often adopted for modeling such a conditional
distribution. However, cGANs are prone to ignore the latent code and learn a
unimodal distribution in conditional image synthesis, which is also known as
the mode collapse issue of GANs. To solve the problem, we propose a simple yet
effective method that explicitly estimates and maximizes the mutual information
between the latent code and the output image in cGANs by using a deep mutual
information neural estimator in this paper. Maximizing the mutual information
strengthens the statistical dependency between the latent code and the output
image, which prevents the generator from ignoring the latent code and
encourages cGANs to fully utilize the latent code for synthesizing diverse
results. Our method not only provides a new perspective from information theory
to improve diversity for I2IT but also achieves disentanglement between the
source domain content and the target domain style for free.
Related papers
- MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling [6.7206291284535125]
We present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM)
Our approach addresses the issue of increasing the diversity of synthetic images.
Our method produces synthetic images with enhanced diversity while maintaining adherence to the target distribution.
arXiv Detail & Related papers (2024-09-25T14:02:43Z) - I2I-Galip: Unsupervised Medical Image Translation Using Generative Adversarial CLIP [30.506544165999564]
Unpaired image-to-image translation is a challenging task due to the absence of paired examples.
We propose a new image-to-image translation framework named Image-to-Image-Generative-Adversarial-CLIP (I2I-Galip)
arXiv Detail & Related papers (2024-09-19T01:44:50Z) - Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion [37.18537753482751]
Conditional Diffusion Relaxing Inversion (CRDI) is designed to enhance distribution diversity in synthetic image generation.
CRDI does not rely on fine-tuning based on only a few samples.
It focuses on reconstructing each target image instance and expanding diversity through few-shot learning.
arXiv Detail & Related papers (2024-07-09T21:58:26Z) - SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial
Network for an end-to-end image translation [18.93434486338439]
SCONE-GAN is shown to be effective for learning to generate realistic and diverse scenery images.
For more realistic and diverse image generation we introduce style reference image.
We validate the proposed algorithm for image-to-image translation and stylizing outdoor images.
arXiv Detail & Related papers (2023-11-07T10:29:16Z) - Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs.
Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z) - Multi-cropping Contrastive Learning and Domain Consistency for
Unsupervised Image-to-Image Translation [5.562419999563734]
We propose a novel unsupervised image-to-image translation framework based on multi-cropping contrastive learning and domain consistency, called MCDUT.
In many image-to-image translation tasks, our method achieves state-of-the-art results, and the advantages of our method have been proven through comparison experiments and ablation research.
arXiv Detail & Related papers (2023-04-24T16:20:28Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis [68.3787368024951]
We propose a novel approach for multi-modal Image-to-image (I2I) translation.
We learn a latent embedding, jointly with the generator, that models the variability of the output domain.
Specifically, we pre-train a generic style encoder using a novel proxy task to learn an embedding of images, from arbitrary domains, into a low-dimensional style latent space.
arXiv Detail & Related papers (2021-04-14T19:58:24Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.