ContraCLIP: Interpretable GAN generation driven by pairs of contrasting
sentences
- URL: http://arxiv.org/abs/2206.02104v1
- Date: Sun, 5 Jun 2022 06:13:42 GMT
- Title: ContraCLIP: Interpretable GAN generation driven by pairs of contrasting
sentences
- Authors: Christos Tzelepis, James Oldfield, Georgios Tzimiropoulos, Ioannis
Patras
- Abstract summary: We find non-linear interpretable paths in the latent space of pre-trained GANs in a model-agnostic manner.
By defining an objective that discovers paths that generate changes along the desired paths in the vision-language embedding space, we provide an intuitive way of controlling the underlying generative factors.
- Score: 45.06326873752593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work addresses the problem of discovering non-linear interpretable paths
in the latent space of pre-trained GANs in a model-agnostic manner. In the
proposed method, the discovery is driven by a set of pairs of natural language
sentences with contrasting semantics, named semantic dipoles, that serve as the
limits of the interpretation that we require by the trainable latent paths to
encode. By using the pre-trained CLIP encoder, the sentences are projected into
the vision-language space, where they serve as dipoles, and where RBF-based
warping functions define a set of non-linear directional paths, one for each
semantic dipole, allowing in this way traversals from one semantic pole to the
other. By defining an objective that discovers paths in the latent space of
GANs that generate changes along the desired paths in the vision-language
embedding space, we provide an intuitive way of controlling the underlying
generative factors and address some of the limitations of the state-of-the-art
works, namely, that a) they are typically tailored to specific GAN
architectures (i.e., StyleGAN), b) they disregard the relative position of the
manipulated and the original image in the image embedding and the relative
position of the image and the text embeddings, and c) they lead to abrupt image
manipulations and quickly arrive at regions of low density and, thus, low image
quality, providing limited control of the generative factors. We provide
extensive qualitative and quantitative results that demonstrate our claims with
two pre-trained GANs, and make the code and the pre-trained models publicly
available at: https://github.com/chi0tzp/ContraCLIP
Related papers
- Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models [21.173910627285338]
Denoising Diffusion Models (DDMs) have emerged as a strong competitor to Generative Adversarial Networks (GANs)
In this paper, we explore the properties of h-space and propose several novel methods for finding meaningful semantic directions within it.
Our approaches are applicable without requiring architectural modifications, text-based guidance, CLIP-based optimization, or model fine-tuning.
arXiv Detail & Related papers (2023-03-20T12:59:32Z) - Discovering Class-Specific GAN Controls for Semantic Image Synthesis [73.91655061467988]
We propose a novel method for finding spatially disentangled class-specific directions in the latent space of pretrained SIS models.
We show that the latent directions found by our method can effectively control the local appearance of semantic classes.
arXiv Detail & Related papers (2022-12-02T21:39:26Z) - Exploring Gradient-based Multi-directional Controls in GANs [19.950198707910587]
We propose a novel approach that discovers nonlinear controls, which enables multi-directional manipulation as well as effective disentanglement.
Our approach is able to gain fine-grained controls over a diverse set of bi-directional and multi-directional attributes, and we showcase its ability to achieve disentanglement significantly better than state-of-the-art methods.
arXiv Detail & Related papers (2022-09-01T19:10:26Z) - Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image
Translation [56.44946660061753]
This paper proposes a universal regularization technique called maximum spatial perturbation consistency (MSPC)
MSPC enforces a spatial perturbation function (T ) and the translation operator (G) to be commutative (i.e., TG = GT )
Our method outperforms the state-of-the-art methods on most I2I benchmarks.
arXiv Detail & Related papers (2022-03-23T19:59:04Z) - HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning [74.76431541169342]
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones.
We propose a novel hierarchical semantic-visual adaptation (HSVA) framework to align semantic and visual domains.
Experiments on four benchmark datasets demonstrate HSVA achieves superior performance on both conventional and generalized ZSL.
arXiv Detail & Related papers (2021-09-30T14:27:50Z) - WarpedGANSpace: Finding non-linear RBF paths in GAN latent space [44.7091944340362]
This work addresses the problem of discovering, in an unsupervised manner, interpretable paths in the latent space of pretrained GANs.
We learn non-linear warpings on the latent space, each one parametrized by a set of RBF-based latent space warping functions.
We show that linear paths can be derived as a special case of our method, and show experimentally that non-linear paths in the latent space lead to steeper, more disentangled and interpretable changes in the image space.
arXiv Detail & Related papers (2021-09-27T21:29:35Z) - Do Generative Models Know Disentanglement? Contrastive Learning is All
You Need [59.033559925639075]
We propose an unsupervised and model-agnostic method: Disentanglement via Contrast (DisCo) in the Variation Space.
DisCo achieves the state-of-the-art disentanglement given pretrained non-disentangled generative models, including GAN, VAE, and Flow.
arXiv Detail & Related papers (2021-02-21T08:01:20Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.