ACE: Zero-Shot Image to Image Translation via Pretrained
Auto-Contrastive-Encoder
- URL: http://arxiv.org/abs/2302.11705v1
- Date: Wed, 22 Feb 2023 23:52:23 GMT
- Title: ACE: Zero-Shot Image to Image Translation via Pretrained
Auto-Contrastive-Encoder
- Authors: Sihan Xu, Zelong Jiang, Ruisi Liu, Kaikai Yang and Zhijie Huang
- Abstract summary: We propose a new approach to extract image features by learning the similarities and differences of samples within the same data distribution.
The design of ACE enables us to achieve zero-shot image-to-image translation with no training on image translation tasks for the first time.
Our model achieves competitive results on multimodal image translation tasks with zero-shot learning as well.
- Score: 2.1874189959020427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image-to-image translation is a fundamental task in computer vision. It
transforms images from one domain to images in another domain so that they have
particular domain-specific characteristics. Most prior works train a generative
model to learn the mapping from a source domain to a target domain. However,
learning such mapping between domains is challenging because data from
different domains can be highly unbalanced in terms of both quality and
quantity. To address this problem, we propose a new approach to extract image
features by learning the similarities and differences of samples within the
same data distribution via a novel contrastive learning framework, which we
call Auto-Contrastive-Encoder (ACE). ACE learns the content code as the
similarity between samples with the same content information and different
style perturbations. The design of ACE enables us to achieve zero-shot
image-to-image translation with no training on image translation tasks for the
first time.
Moreover, our learning method can learn the style features of images on
different domains effectively. Consequently, our model achieves competitive
results on multimodal image translation tasks with zero-shot learning as well.
Additionally, we demonstrate the potential of our method in transfer learning.
With fine-tuning, the quality of translated images improves in unseen domains.
Even though we use contrastive learning, all of our training can be performed
on a single GPU with the batch size of 8.
Related papers
- Multi-cropping Contrastive Learning and Domain Consistency for
Unsupervised Image-to-Image Translation [5.562419999563734]
We propose a novel unsupervised image-to-image translation framework based on multi-cropping contrastive learning and domain consistency, called MCDUT.
In many image-to-image translation tasks, our method achieves state-of-the-art results, and the advantages of our method have been proven through comparison experiments and ablation research.
arXiv Detail & Related papers (2023-04-24T16:20:28Z) - Multi-domain Unsupervised Image-to-Image Translation with Appearance
Adaptive Convolution [62.4972011636884]
We propose a novel multi-domain unsupervised image-to-image translation (MDUIT) framework.
We exploit the decomposed content feature and appearance adaptive convolution to translate an image into a target appearance.
We show that the proposed method produces visually diverse and plausible results in multiple domains compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-02-06T14:12:34Z) - Unaligned Image-to-Image Translation by Learning to Reweight [40.93678165567824]
Unsupervised image-to-image translation aims at learning the mapping from the source to target domain without using paired images for training.
An essential yet restrictive assumption for unsupervised image translation is that the two domains are aligned.
We propose to select images based on importance reweighting and develop a method to learn the weights and perform translation simultaneously and automatically.
arXiv Detail & Related papers (2021-09-24T04:08:22Z) - Self-Supervised Learning of Domain Invariant Features for Depth
Estimation [35.74969527929284]
We tackle the problem of unsupervised synthetic-to-realistic domain adaptation for single image depth estimation.
An essential building block of single image depth estimation is an encoder-decoder task network that takes RGB images as input and produces depth maps as output.
We propose a novel training strategy to force the task network to learn domain invariant representations in a self-supervised manner.
arXiv Detail & Related papers (2021-06-04T16:45:48Z) - StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis [68.3787368024951]
We propose a novel approach for multi-modal Image-to-image (I2I) translation.
We learn a latent embedding, jointly with the generator, that models the variability of the output domain.
Specifically, we pre-train a generic style encoder using a novel proxy task to learn an embedding of images, from arbitrary domains, into a low-dimensional style latent space.
arXiv Detail & Related papers (2021-04-14T19:58:24Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z) - Unsupervised Image-to-Image Translation via Pre-trained StyleGAN2
Network [73.5062435623908]
We propose a new I2I translation method that generates a new model in the target domain via a series of model transformations.
By feeding the latent vector into the generated model, we can perform I2I translation between the source domain and target domain.
arXiv Detail & Related papers (2020-10-12T13:51:40Z) - Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation [59.73535607392732]
Image to image translation aims to learn a mapping that transforms an image from one visual domain to another.
We propose the use of an image retrieval system to assist the image-to-image translation task.
arXiv Detail & Related papers (2020-08-11T20:11:53Z) - Contrastive Learning for Unpaired Image-to-Image Translation [64.47477071705866]
In image-to-image translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain.
We propose a framework based on contrastive learning to maximize mutual information between the two.
We demonstrate that our framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time.
arXiv Detail & Related papers (2020-07-30T17:59:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.