Scaling-up Disentanglement for Image Translation
- URL: http://arxiv.org/abs/2103.14017v1
- Date: Thu, 25 Mar 2021 17:52:38 GMT
- Title: Scaling-up Disentanglement for Image Translation
- Authors: Aviv Gabbay and Yedid Hoshen
- Abstract summary: We propose OverLORD, a single framework for disentangling labeled and unlabeled attributes.
We do not rely on adversarial training or any architectural biases.
In an extensive evaluation, we present significantly better disentanglement with higher translation quality and greater output diversity than state-of-the-art methods.
- Score: 40.7636450847048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image translation methods typically aim to manipulate a set of labeled
attributes (given as supervision at training time e.g. domain label) while
leaving the unlabeled attributes intact. Current methods achieve either: (i)
disentanglement, which exhibits low visual fidelity and can only be satisfied
where the attributes are perfectly uncorrelated. (ii) visually-plausible
translations, which are clearly not disentangled. In this work, we propose
OverLORD, a single framework for disentangling labeled and unlabeled attributes
as well as synthesizing high-fidelity images, which is composed of two stages;
(i) Disentanglement: Learning disentangled representations with latent
optimization. Differently from previous approaches, we do not rely on
adversarial training or any architectural biases. (ii) Synthesis: Training
feed-forward encoders for inferring the learned attributes and tuning the
generator in an adversarial manner to increase the perceptual quality. When the
labeled and unlabeled attributes are correlated, we model an additional
representation that accounts for the correlated attributes and improves
disentanglement. We highlight that our flexible framework covers multiple image
translation settings e.g. attribute manipulation, pose-appearance translation,
segmentation-guided synthesis and shape-texture transfer. In an extensive
evaluation, we present significantly better disentanglement with higher
translation quality and greater output diversity than state-of-the-art methods.
Related papers
- StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation [18.213286385769525]
CycleGAN-based methods are known to hide the mismatched information in the generated images to bypass cycle consistency objectives.
We introduce StegoGAN, a novel model that leverages steganography to prevent spurious features in generated images.
Our approach enhances the semantic consistency of the translated images without requiring additional postprocessing or supervision.
arXiv Detail & Related papers (2024-03-29T12:23:58Z) - Improving Generalization of Image Captioning with Unsupervised Prompt
Learning [63.26197177542422]
Generalization of Image Captioning (GeneIC) learns a domain-specific prompt vector for the target domain without requiring annotated data.
GeneIC aligns visual and language modalities with a pre-trained Contrastive Language-Image Pre-Training (CLIP) model.
arXiv Detail & Related papers (2023-08-05T12:27:01Z) - DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition
with Limited Annotations [79.433122872973]
Multi-label image recognition in the low-label regime is a task of great challenge and practical significance.
We leverage the powerful alignment between textual and visual features pretrained with millions of auxiliary image-text pairs.
We introduce an efficient and effective framework called Evidence-guided Dual Context Optimization (DualCoOp++)
arXiv Detail & Related papers (2023-08-03T17:33:20Z) - LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data [39.421312439022316]
We present a LANguage-driven Image-to-image Translation model, dubbed LANIT.
We leverage easy-to-obtain candidate attributes given in texts for a dataset: the similarity between images and attributes indicates per-sample domain labels.
Experiments on several standard benchmarks demonstrate that LANIT achieves comparable or superior performance to existing models.
arXiv Detail & Related papers (2022-08-31T14:30:00Z) - Unsupervised Image-to-Image Translation with Generative Prior [103.54337984566877]
Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data.
We present a novel framework, Generative Prior-guided UN Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.
arXiv Detail & Related papers (2022-04-07T17:59:23Z) - Marginal Contrastive Correspondence for Guided Image Generation [58.0605433671196]
Exemplar-based image translation establishes dense correspondences between a conditional input and an exemplar from two different domains.
Existing work builds the cross-domain correspondences implicitly by minimizing feature-wise distances across the two domains.
We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation.
arXiv Detail & Related papers (2022-04-01T13:55:44Z) - Semi-supervised Semantic Segmentation with Directional Context-aware
Consistency [66.49995436833667]
We focus on the semi-supervised segmentation problem where only a small set of labeled data is provided with a much larger collection of totally unlabeled images.
A preferred high-level representation should capture the contextual information while not losing self-awareness.
We present the Directional Contrastive Loss (DC Loss) to accomplish the consistency in a pixel-to-pixel manner.
arXiv Detail & Related papers (2021-06-27T03:42:40Z) - Contrastive Learning for Unsupervised Image-to-Image Translation [10.091669091440396]
We propose an unsupervised image-to-image translation method based on contrastive learning.
We randomly sample a pair of images and train the generator to change the appearance of one towards another while keeping the original structure.
Experimental results show that our method outperforms the leading unsupervised baselines in terms of visual quality and translation accuracy.
arXiv Detail & Related papers (2021-05-07T08:43:38Z) - A Novel Estimator of Mutual Information for Learning to Disentangle
Textual Representations [27.129551973093008]
This paper introduces a novel variational upper bound to the mutual information between an attribute and the latent code of an encoder.
It aims at controlling the approximation error via the Renyi's divergence, leading to both better disentangled representations and a precise control of the desirable degree of disentanglement.
We show the superiority of this method on fair classification and on textual style transfer tasks.
arXiv Detail & Related papers (2021-05-06T14:05:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.