Semi-supervised Learning for Few-shot Image-to-Image Translation
- URL: http://arxiv.org/abs/2003.13853v2
- Date: Thu, 2 Apr 2020 09:09:19 GMT
- Title: Semi-supervised Learning for Few-shot Image-to-Image Translation
- Authors: Yaxing Wang, Salman Khan, Abel Gonzalez-Garcia, Joost van de Weijer,
Fahad Shahbaz Khan
- Abstract summary: We propose a semi-supervised method for few-shot image translation, called SEMIT.
Our method achieves excellent results on four different datasets using as little as 10% of the source labels.
- Score: 89.48165936436183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the last few years, unpaired image-to-image translation has witnessed
remarkable progress. Although the latest methods are able to generate realistic
images, they crucially rely on a large number of labeled images. Recently, some
methods have tackled the challenging setting of few-shot image-to-image
translation, reducing the labeled data requirements for the target domain
during inference. In this work, we go one step further and reduce the amount of
required labeled data also from the source domain during training. To do so, we
propose applying semi-supervised learning via a noise-tolerant pseudo-labeling
procedure. We also apply a cycle consistency constraint to further exploit the
information from unlabeled images, either from the same dataset or external.
Additionally, we propose several structural modifications to facilitate the
image translation task under these circumstances. Our semi-supervised method
for few-shot image translation, called SEMIT, achieves excellent results on
four different datasets using as little as 10% of the source labels, and
matches the performance of the main fully-supervised competitor using only 20%
labeled data. Our code and models are made public at:
https://github.com/yaxingwang/SEMIT.
Related papers
- An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance [53.974497865647336]
We take a first step towards translating images to make them culturally relevant.
We build three pipelines comprising state-of-the-art generative models to do the task.
We conduct a human evaluation of translated images to assess for cultural relevance and meaning preservation.
arXiv Detail & Related papers (2024-04-01T17:08:50Z) - Learning from Rich Semantics and Coarse Locations for Long-tailed Object
Detection [157.18560601328534]
RichSem is a robust method to learn rich semantics from coarse locations without the need of accurate bounding boxes.
We add a semantic branch to our detector to learn these soft semantics and enhance feature representations for long-tailed object detection.
Our method achieves state-of-the-art performance without requiring complex training and testing procedures.
arXiv Detail & Related papers (2023-10-18T17:59:41Z) - Few-shot Class-Incremental Semantic Segmentation via Pseudo-Labeling and
Knowledge Distillation [3.4436201325139737]
We address the problem of learning new classes for semantic segmentation models from few examples.
For learning from limited data, we propose a pseudo-labeling strategy to augment the few-shot training annotations.
We integrate the above steps into a single convolutional neural network with a unified learning objective.
arXiv Detail & Related papers (2023-08-05T05:05:37Z) - A Semi-Paired Approach For Label-to-Image Translation [6.888253564585197]
We introduce the first semi-supervised (semi-paired) framework for label-to-image translation.
In the semi-paired setting, the model has access to a small set of paired data and a larger set of unpaired images and labels.
We propose a training algorithm for this shared network, and we present a rare classes sampling algorithm to focus on under-represented classes.
arXiv Detail & Related papers (2023-06-23T16:13:43Z) - Semi-Supervised Image Captioning by Adversarially Propagating Labeled
Data [95.0476489266988]
We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models.
Our proposed method trains a captioner to learn from a paired data and to progressively associate unpaired data.
Our extensive and comprehensive empirical results both on (1) image-based and (2) dense region-based captioning datasets followed by comprehensive analysis on the scarcely-paired dataset.
arXiv Detail & Related papers (2023-01-26T15:25:43Z) - LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data [39.421312439022316]
We present a LANguage-driven Image-to-image Translation model, dubbed LANIT.
We leverage easy-to-obtain candidate attributes given in texts for a dataset: the similarity between images and attributes indicates per-sample domain labels.
Experiments on several standard benchmarks demonstrate that LANIT achieves comparable or superior performance to existing models.
arXiv Detail & Related papers (2022-08-31T14:30:00Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Contrastive Learning for Unpaired Image-to-Image Translation [64.47477071705866]
In image-to-image translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain.
We propose a framework based on contrastive learning to maximize mutual information between the two.
We demonstrate that our framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time.
arXiv Detail & Related papers (2020-07-30T17:59:58Z) - Rethinking the Truly Unsupervised Image-to-Image Translation [29.98784909971291]
Unsupervised image-to-image translation model (TUNIT) learns to separate image domains and translates input images into estimated domains.
Experimental results show TUNIT achieves comparable or even better performance than the set-level supervised model trained with full labels.
TUNIT can be easily extended to semi-supervised learning with a few labeled data.
arXiv Detail & Related papers (2020-06-11T15:15:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.