COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content
Conditioned Style Encoder
- URL: http://arxiv.org/abs/2007.07431v3
- Date: Wed, 29 Jul 2020 02:06:50 GMT
- Title: COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content
Conditioned Style Encoder
- Authors: Kuniaki Saito, Kate Saenko, Ming-Yu Liu
- Abstract summary: Unsupervised image-to-image translation aims to learn a mapping of an image in a given domain to an analogous image in a different domain.
We propose a new few-shot image translation model, COCO-FUNIT, which computes the style embedding of the example images conditioned on the input image.
Our model shows effectiveness in addressing the content loss problem.
- Score: 70.23358875904891
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised image-to-image translation intends to learn a mapping of an
image in a given domain to an analogous image in a different domain, without
explicit supervision of the mapping. Few-shot unsupervised image-to-image
translation further attempts to generalize the model to an unseen domain by
leveraging example images of the unseen domain provided at inference time.
While remarkably successful, existing few-shot image-to-image translation
models find it difficult to preserve the structure of the input image while
emulating the appearance of the unseen domain, which we refer to as the content
loss problem. This is particularly severe when the poses of the objects in the
input and example images are very different. To address the issue, we propose a
new few-shot image translation model, COCO-FUNIT, which computes the style
embedding of the example images conditioned on the input image and a new module
called the constant style bias. Through extensive experimental validations with
comparison to the state-of-the-art, our model shows effectiveness in addressing
the content loss problem. For code and pretrained models, please check out
https://nvlabs.github.io/COCO-FUNIT/ .
Related papers
- Conditional Diffusion on Web-Scale Image Pairs leads to Diverse Image Variations [32.892042877725125]
Current image variation techniques involve adapting a text-to-image model to reconstruct an input image conditioned on the same image.
We show that a diffusion model trained to reconstruct an input image from frozen embeddings, can reconstruct the image with minor variations.
We propose a new pretraining strategy to generate image variations using a large collection of image pairs.
arXiv Detail & Related papers (2024-05-23T17:58:03Z) - Separating Content and Style for Unsupervised Image-to-Image Translation [20.44733685446886]
Unsupervised image-to-image translation aims to learn the mapping between two visual domains with unpaired samples.
We propose to separate the content code and style code simultaneously in a unified framework.
Based on the correlation between the latent features and the high-level domain-invariant tasks, the proposed framework demonstrates superior performance.
arXiv Detail & Related papers (2021-10-27T12:56:50Z) - ISF-GAN: An Implicit Style Function for High-Resolution Image-to-Image
Translation [55.47515538020578]
This work proposes an implicit style function (ISF) to straightforwardly achieve multi-modal and multi-domain image-to-image translation.
Our results in human face and animal manipulations show significantly improved results over the baselines.
Our model enables cost-effective multi-modal unsupervised image-to-image translations at high resolution using pre-trained unconditional GANs.
arXiv Detail & Related papers (2021-09-26T04:51:39Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z) - BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal
Transfer [53.79505340315916]
We introduce BalaGAN, specifically designed to tackle the domain imbalance problem.
We leverage the latent modalities of the richer domain to turn the image-to-image translation problem into a balanced, multi-class, and conditional translation problem.
We show that BalaGAN outperforms strong baselines of both unconditioned and style-transfer-based image-to-image translation methods.
arXiv Detail & Related papers (2020-10-05T14:16:41Z) - Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation [59.73535607392732]
Image to image translation aims to learn a mapping that transforms an image from one visual domain to another.
We propose the use of an image retrieval system to assist the image-to-image translation task.
arXiv Detail & Related papers (2020-08-11T20:11:53Z) - Contrastive Learning for Unpaired Image-to-Image Translation [64.47477071705866]
In image-to-image translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain.
We propose a framework based on contrastive learning to maximize mutual information between the two.
We demonstrate that our framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time.
arXiv Detail & Related papers (2020-07-30T17:59:58Z) - Rethinking the Truly Unsupervised Image-to-Image Translation [29.98784909971291]
Unsupervised image-to-image translation model (TUNIT) learns to separate image domains and translates input images into estimated domains.
Experimental results show TUNIT achieves comparable or even better performance than the set-level supervised model trained with full labels.
TUNIT can be easily extended to semi-supervised learning with a few labeled data.
arXiv Detail & Related papers (2020-06-11T15:15:12Z) - GANILLA: Generative Adversarial Networks for Image to Illustration
Translation [12.55972766570669]
We show that although the current state-of-the-art image-to-image translation models successfully transfer either the style or the content, they fail to transfer both at the same time.
We propose a new generator network to address this issue and show that the resulting network strikes a better balance between style and content.
arXiv Detail & Related papers (2020-02-13T17:12:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.