An Image is Worth More Than a Thousand Words: Towards Disentanglement in
the Wild
- URL: http://arxiv.org/abs/2106.15610v1
- Date: Tue, 29 Jun 2021 17:54:24 GMT
- Title: An Image is Worth More Than a Thousand Words: Towards Disentanglement in
the Wild
- Authors: Aviv Gabbay, Niv Cohen, Yedid Hoshen
- Abstract summary: Unsupervised disentanglement has been shown to be theoretically impossible without inductive biases on the models and the data.
We propose a method for disentangling a set of factors which are only partially labeled, as well as separating the complementary set of residual factors.
- Score: 34.505472771669744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised disentanglement has been shown to be theoretically impossible
without inductive biases on the models and the data. As an alternative
approach, recent methods rely on limited supervision to disentangle the factors
of variation and allow their identifiability. While annotating the true
generative factors is only required for a limited number of observations, we
argue that it is infeasible to enumerate all the factors of variation that
describe a real-world image distribution. To this end, we propose a method for
disentangling a set of factors which are only partially labeled, as well as
separating the complementary set of residual factors that are never explicitly
specified. Our success in this challenging setting, demonstrated on synthetic
benchmarks, gives rise to leveraging off-the-shelf image descriptors to
partially annotate a subset of attributes in real image domains (e.g. of human
faces) with minimal manual effort. Specifically, we use a recent language-image
embedding model (CLIP) to annotate a set of attributes of interest in a
zero-shot manner and demonstrate state-of-the-art disentangled image
manipulation results.
Related papers
- Learning to Rank Patches for Unbiased Image Redundancy Reduction [80.93989115541966]
Images suffer from heavy spatial redundancy because pixels in neighboring regions are spatially correlated.
Existing approaches strive to overcome this limitation by reducing less meaningful image regions.
We propose a self-supervised framework for image redundancy reduction called Learning to Rank Patches.
arXiv Detail & Related papers (2024-03-31T13:12:41Z) - StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation [18.213286385769525]
CycleGAN-based methods are known to hide the mismatched information in the generated images to bypass cycle consistency objectives.
We introduce StegoGAN, a novel model that leverages steganography to prevent spurious features in generated images.
Our approach enhances the semantic consistency of the translated images without requiring additional postprocessing or supervision.
arXiv Detail & Related papers (2024-03-29T12:23:58Z) - What can we learn about a generated image corrupting its latent
representation? [57.1841740328509]
We investigate the hypothesis that we can predict image quality based on its latent representation in the GANs bottleneck.
We achieve this by corrupting the latent representation with noise and generating multiple outputs.
arXiv Detail & Related papers (2022-10-12T14:40:32Z) - Semi-supervised Semantic Segmentation with Directional Context-aware
Consistency [66.49995436833667]
We focus on the semi-supervised segmentation problem where only a small set of labeled data is provided with a much larger collection of totally unlabeled images.
A preferred high-level representation should capture the contextual information while not losing self-awareness.
We present the Directional Contrastive Loss (DC Loss) to accomplish the consistency in a pixel-to-pixel manner.
arXiv Detail & Related papers (2021-06-27T03:42:40Z) - Where and What? Examining Interpretable Disentangled Representations [96.32813624341833]
Capturing interpretable variations has long been one of the goals in disentanglement learning.
Unlike the independence assumption, interpretability has rarely been exploited to encourage disentanglement in the unsupervised setting.
In this paper, we examine the interpretability of disentangled representations by investigating two questions: where to be interpreted and what to be interpreted.
arXiv Detail & Related papers (2021-04-07T11:22:02Z) - SMILE: Semantically-guided Multi-attribute Image and Layout Editing [154.69452301122175]
Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs)
We present a multimodal representation that handles all attributes, be it guided by random noise or images, while only using the underlying domain information of the target domain.
Our method is capable of adding, removing or changing either fine-grained or coarse attributes by using an image as a reference or by exploring the style distribution space.
arXiv Detail & Related papers (2020-10-05T20:15:21Z) - Learning to Manipulate Individual Objects in an Image [71.55005356240761]
We describe a method to train a generative model with latent factors that are independent and localized.
This means that perturbing the latent variables affects only local regions of the synthesized image, corresponding to objects.
Unlike other unsupervised generative models, ours enables object-centric manipulation, without requiring object-level annotations.
arXiv Detail & Related papers (2020-04-11T21:50:20Z) - Representation Learning Through Latent Canonicalizations [24.136856168381502]
We seek to learn a representation on a large annotated data source that generalizes to a target domain using limited new supervision.
We relax the requirement of explicit latent disentanglement and instead encourage linearity of individual factors of variation.
We demonstrate experimentally that our method helps reduce the number of observations needed to generalize to a similar target domain when compared to a number of supervised baselines.
arXiv Detail & Related papers (2020-02-26T22:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.