Optimized latent-code selection for explainable conditional
text-to-image GANs
- URL: http://arxiv.org/abs/2204.12678v1
- Date: Wed, 27 Apr 2022 03:12:55 GMT
- Title: Optimized latent-code selection for explainable conditional
text-to-image GANs
- Authors: Zhenxing Zhang and Lambert Schomaker
- Abstract summary: We present a variety of techniques to take a deep look into the latent space and semantic space of the conditional text-to-image GANs model.
We propose a framework for finding good latent codes by utilizing a linear SVM.
- Score: 8.26410341981427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of text-to-image generation has achieved remarkable progress due to
the advances in the conditional generative adversarial networks (GANs).
However, existing conditional text-to-image GANs approaches mostly concentrate
on improving both image quality and semantic relevance but ignore the
explainability of the model which plays a vital role in real-world
applications. In this paper, we present a variety of techniques to take a deep
look into the latent space and semantic space of the conditional text-to-image
GANs model. We introduce pairwise linear interpolation of latent codes and
`linguistic' linear interpolation to study what the model has learned within
the latent space and `linguistic' embeddings. Subsequently, we extend linear
interpolation to triangular interpolation conditioned on three corners to
further analyze the model. After that, we build a Good/Bad data set containing
unsuccessfully and successfully synthetic samples and corresponding latent
codes for the image-quality research. Based on this data set, we propose a
framework for finding good latent codes by utilizing a linear SVM. Experimental
results on the recent DiverGAN generator trained on two benchmark data sets
qualitatively prove the effectiveness of our presented techniques, with a
better than 94\% accuracy in predicting ${Good}$/${Bad}$ classes for latent
vectors. The Good/Bad data set is publicly available at
https://zenodo.org/record/5850224#.YeGMwP7MKUk.
Related papers
- Noisy-Correspondence Learning for Text-to-Image Person Re-identification [50.07634676709067]
We propose a novel Robust Dual Embedding method (RDE) to learn robust visual-semantic associations even with noisy correspondences.
Our method achieves state-of-the-art results both with and without synthetic noisy correspondences on three datasets.
arXiv Detail & Related papers (2023-08-19T05:34:13Z) - Flow Matching in Latent Space [2.9330609943398525]
Flow matching is a framework to train generative models that exhibits impressive empirical performance.
We propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency.
Our work stands as a pioneering contribution in the integration of various conditions into flow matching for conditional generation tasks.
arXiv Detail & Related papers (2023-07-17T17:57:56Z) - Extracting Semantic Knowledge from GANs with Unsupervised Learning [65.32631025780631]
Generative Adversarial Networks (GANs) encode semantics in feature maps in a linearly separable form.
We propose a novel clustering algorithm, named KLiSH, which leverages the linear separability to cluster GAN's features.
KLiSH succeeds in extracting fine-grained semantics of GANs trained on datasets of various objects.
arXiv Detail & Related papers (2022-11-30T03:18:16Z) - DeepDC: Deep Distance Correlation as a Perceptual Image Quality
Evaluator [53.57431705309919]
ImageNet pre-trained deep neural networks (DNNs) show notable transferability for building effective image quality assessment (IQA) models.
We develop a novel full-reference IQA (FR-IQA) model based exclusively on pre-trained DNN features.
We conduct comprehensive experiments to demonstrate the superiority of the proposed quality model on five standard IQA datasets.
arXiv Detail & Related papers (2022-11-09T14:57:27Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - OptGAN: Optimizing and Interpreting the Latent Space of the Conditional
Text-to-Image GANs [8.26410341981427]
We study how to ensure that generated samples are believable, realistic or natural.
We present a novel algorithm which identifies semantically-understandable directions in the latent space of a conditional text-to-image GAN architecture.
arXiv Detail & Related papers (2022-02-25T20:00:33Z) - GLIDE: Towards Photorealistic Image Generation and Editing with
Text-Guided Diffusion Models [16.786221846896108]
We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies.
We find that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples.
Our models can be fine-tuned to perform image inpainting, enabling powerful text-driven image editing.
arXiv Detail & Related papers (2021-12-20T18:42:55Z) - InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model.
This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z) - Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [101.97397967958722]
We propose a novel unified framework of Cycle-consistent Inverse GAN for both text-to-image generation and text-guided image manipulation tasks.
We learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image.
In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes.
arXiv Detail & Related papers (2021-08-03T08:38:16Z) - Knowledge Generation -- Variational Bayes on Knowledge Graphs [0.685316573653194]
This thesis is a proof of concept for potential of Vari Auto-Encoder (VAE) on representation of real-world Knowledge Graphs.
Inspired by successful approaches to generation graphs, we evaluate the capabilities of our model, the Variational Auto-Encoder (RGVAE)
The RGVAE is first evaluated on link prediction. The mean reciprocal rank (MRR) scores on the two FB15K-237 and WN18RR datasets are compared.
We investigate the latent space in a twofold experiment: first, linear between the latent representation of two triples, then the exploration of each
arXiv Detail & Related papers (2021-01-21T21:23:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.