Contrastive Learning for Unpaired Image-to-Image Translation
- URL: http://arxiv.org/abs/2007.15651v3
- Date: Thu, 20 Aug 2020 17:33:08 GMT
- Title: Contrastive Learning for Unpaired Image-to-Image Translation
- Authors: Taesung Park, Alexei A. Efros, Richard Zhang, Jun-Yan Zhu
- Abstract summary: In image-to-image translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain.
We propose a framework based on contrastive learning to maximize mutual information between the two.
We demonstrate that our framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time.
- Score: 64.47477071705866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In image-to-image translation, each patch in the output should reflect the
content of the corresponding patch in the input, independent of domain. We
propose a straightforward method for doing so -- maximizing mutual information
between the two, using a framework based on contrastive learning. The method
encourages two elements (corresponding patches) to map to a similar point in a
learned feature space, relative to other elements (other patches) in the
dataset, referred to as negatives. We explore several critical design choices
for making contrastive learning effective in the image synthesis setting.
Notably, we use a multilayer, patch-based approach, rather than operate on
entire images. Furthermore, we draw negatives from within the input image
itself, rather than from the rest of the dataset. We demonstrate that our
framework enables one-sided translation in the unpaired image-to-image
translation setting, while improving quality and reducing training time. In
addition, our method can even be extended to the training setting where each
"domain" is only a single image.
Related papers
- Weakly-Supervised Visual-Textual Grounding with Semantic Prior
Refinement [52.80968034977751]
Using only image-sentence pairs, weakly-supervised visual-textual grounding aims to learn region-phrase correspondences of the respective entity mentions.
We propose the Semantic Prior Refinement Model (SPRM), whose predictions are obtained by combining the output of two main modules.
Our approach shows state-of-the-art results on two popular datasets, Flickr30k Entities and ReferIt, with a 9.6% absolute improvement.
arXiv Detail & Related papers (2023-05-18T12:25:07Z) - ACE: Zero-Shot Image to Image Translation via Pretrained
Auto-Contrastive-Encoder [2.1874189959020427]
We propose a new approach to extract image features by learning the similarities and differences of samples within the same data distribution.
The design of ACE enables us to achieve zero-shot image-to-image translation with no training on image translation tasks for the first time.
Our model achieves competitive results on multimodal image translation tasks with zero-shot learning as well.
arXiv Detail & Related papers (2023-02-22T23:52:23Z) - Exploring Negatives in Contrastive Learning for Unpaired Image-to-Image
Translation [12.754320302262533]
We introduce a new negative Pruning technology for Unpaired image-to-image Translation (PUT) by sparsifying and ranking the patches.
The proposed algorithm is efficient, flexible and enables the model to learn essential information between corresponding patches stably.
arXiv Detail & Related papers (2022-04-23T08:31:18Z) - Multi-domain Unsupervised Image-to-Image Translation with Appearance
Adaptive Convolution [62.4972011636884]
We propose a novel multi-domain unsupervised image-to-image translation (MDUIT) framework.
We exploit the decomposed content feature and appearance adaptive convolution to translate an image into a target appearance.
We show that the proposed method produces visually diverse and plausible results in multiple domains compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-02-06T14:12:34Z) - Contrastive Unpaired Translation using Focal Loss for Patch
Classification [0.0]
Contrastive Unpaired Translation is a new method for image-to-image translation.
We show that using focal loss in place of cross-entropy loss within the PatchNCE loss can improve on the model's performance.
arXiv Detail & Related papers (2021-09-25T20:22:33Z) - Unaligned Image-to-Image Translation by Learning to Reweight [40.93678165567824]
Unsupervised image-to-image translation aims at learning the mapping from the source to target domain without using paired images for training.
An essential yet restrictive assumption for unsupervised image translation is that the two domains are aligned.
We propose to select images based on importance reweighting and develop a method to learn the weights and perform translation simultaneously and automatically.
arXiv Detail & Related papers (2021-09-24T04:08:22Z) - Dual Contrastive Learning for Unsupervised Image-to-Image Translation [16.759958400617947]
Unsupervised image-to-image translation tasks aim to find a mapping between a source domain X and a target domain Y from unpaired training data.
Contrastive learning for Unpaired image-to-image Translation yields state-of-the-art results.
We propose a novel method based on contrastive learning and a dual learning setting to infer an efficient mapping between unpaired data.
arXiv Detail & Related papers (2021-04-15T18:00:22Z) - The Spatially-Correlative Loss for Various Image Translation Tasks [69.62228639870114]
We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency.
Previous methods attempt this by using pixel-level cycle-consistency or feature-level matching losses.
We show distinct improvement over baseline models in all three modes of unpaired I2I translation: single-modal, multi-modal, and even single-image translation.
arXiv Detail & Related papers (2021-04-02T02:13:30Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z) - Cross-domain Correspondence Learning for Exemplar-based Image
Translation [59.35767271091425]
We present a framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain.
The output has the style (e.g., color, texture) in consistency with the semantically corresponding objects in the exemplar.
We show that our method is superior to state-of-the-art methods in terms of image quality significantly.
arXiv Detail & Related papers (2020-04-12T09:10:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.