Rethinking the Truly Unsupervised Image-to-Image Translation
- URL: http://arxiv.org/abs/2006.06500v2
- Date: Fri, 20 Aug 2021 03:36:26 GMT
- Title: Rethinking the Truly Unsupervised Image-to-Image Translation
- Authors: Kyungjune Baek, Yunjey Choi, Youngjung Uh, Jaejun Yoo, Hyunjung Shim
- Abstract summary: Unsupervised image-to-image translation model (TUNIT) learns to separate image domains and translates input images into estimated domains.
Experimental results show TUNIT achieves comparable or even better performance than the set-level supervised model trained with full labels.
TUNIT can be easily extended to semi-supervised learning with a few labeled data.
- Score: 29.98784909971291
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Every recent image-to-image translation model inherently requires either
image-level (i.e. input-output pairs) or set-level (i.e. domain labels)
supervision. However, even set-level supervision can be a severe bottleneck for
data collection in practice. In this paper, we tackle image-to-image
translation in a fully unsupervised setting, i.e., neither paired images nor
domain labels. To this end, we propose a truly unsupervised image-to-image
translation model (TUNIT) that simultaneously learns to separate image domains
and translates input images into the estimated domains. Experimental results
show that our model achieves comparable or even better performance than the
set-level supervised model trained with full labels, generalizes well on
various datasets, and is robust against the choice of hyperparameters (e.g. the
preset number of pseudo domains). Furthermore, TUNIT can be easily extended to
semi-supervised learning with a few labeled data.
Related papers
- WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization [63.98650220772378]
We present WIDIn, Wording Images for Domain-Invariant representation, to disentangle discriminative visual representation.
We first estimate the language embedding with fine-grained alignment, which can be used to adaptively identify and then remove domain-specific counterpart.
We show that WIDIn can be applied to both pretrained vision-language models like CLIP, and separately trained uni-modal models like MoCo and BERT.
arXiv Detail & Related papers (2024-05-28T17:46:27Z) - A Semi-Paired Approach For Label-to-Image Translation [6.888253564585197]
We introduce the first semi-supervised (semi-paired) framework for label-to-image translation.
In the semi-paired setting, the model has access to a small set of paired data and a larger set of unpaired images and labels.
We propose a training algorithm for this shared network, and we present a rare classes sampling algorithm to focus on under-represented classes.
arXiv Detail & Related papers (2023-06-23T16:13:43Z) - GP-UNIT: Generative Prior for Versatile Unsupervised Image-to-Image
Translation [103.54337984566877]
We introduce a novel versatile framework, Generative Prior-guided UNsupervised Image-to-image Translation (GP-UNIT)
GP-UNIT is able to perform valid translations between both close domains and distant domains.
We validate the superiority of GP-UNIT over state-of-the-art translation models in robust, high-quality and diversified translations.
arXiv Detail & Related papers (2023-06-07T17:59:22Z) - LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data [39.421312439022316]
We present a LANguage-driven Image-to-image Translation model, dubbed LANIT.
We leverage easy-to-obtain candidate attributes given in texts for a dataset: the similarity between images and attributes indicates per-sample domain labels.
Experiments on several standard benchmarks demonstrate that LANIT achieves comparable or superior performance to existing models.
arXiv Detail & Related papers (2022-08-31T14:30:00Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Unsupervised Image-to-Image Translation with Generative Prior [103.54337984566877]
Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data.
We present a novel framework, Generative Prior-guided UN Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.
arXiv Detail & Related papers (2022-04-07T17:59:23Z) - COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content
Conditioned Style Encoder [70.23358875904891]
Unsupervised image-to-image translation aims to learn a mapping of an image in a given domain to an analogous image in a different domain.
We propose a new few-shot image translation model, COCO-FUNIT, which computes the style embedding of the example images conditioned on the input image.
Our model shows effectiveness in addressing the content loss problem.
arXiv Detail & Related papers (2020-07-15T02:01:14Z) - Semi-supervised Learning for Few-shot Image-to-Image Translation [89.48165936436183]
We propose a semi-supervised method for few-shot image translation, called SEMIT.
Our method achieves excellent results on four different datasets using as little as 10% of the source labels.
arXiv Detail & Related papers (2020-03-30T22:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.