Related papers: Optimized latent-code selection for explainable conditional text-to-image GANs

Optimized latent-code selection for explainable conditional text-to-image GANs

URL: http://arxiv.org/abs/2204.12678v1
Date: Wed, 27 Apr 2022 03:12:55 GMT
Title: Optimized latent-code selection for explainable conditional text-to-image GANs
Authors: Zhenxing Zhang and Lambert Schomaker
Abstract summary: We present a variety of techniques to take a deep look into the latent space and semantic space of the conditional text-to-image GANs model. We propose a framework for finding good latent codes by utilizing a linear SVM.
Score: 8.26410341981427
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The task of text-to-image generation has achieved remarkable progress due to the advances in the conditional generative adversarial networks (GANs). However, existing conditional text-to-image GANs approaches mostly concentrate on improving both image quality and semantic relevance but ignore the explainability of the model which plays a vital role in real-world applications. In this paper, we present a variety of techniques to take a deep look into the latent space and semantic space of the conditional text-to-image GANs model. We introduce pairwise linear interpolation of latent codes and `linguistic' linear interpolation to study what the model has learned within the latent space and `linguistic' embeddings. Subsequently, we extend linear interpolation to triangular interpolation conditioned on three corners to further analyze the model. After that, we build a Good/Bad data set containing unsuccessfully and successfully synthetic samples and corresponding latent codes for the image-quality research. Based on this data set, we propose a framework for finding good latent codes by utilizing a linear SVM. Experimental results on the recent DiverGAN generator trained on two benchmark data sets qualitatively prove the effectiveness of our presented techniques, with a better than 94\% accuracy in predicting ${Good}$/${Bad}$ classes for latent vectors. The Good/Bad data set is publicly available at https://zenodo.org/record/5850224#.YeGMwP7MKUk.

Related papers

Noisy-Correspondence Learning for Text-to-Image Person Re-identification [50.07634676709067]
We propose a novel Robust Dual Embedding method (RDE) to learn robust visual-semantic associations even with noisy correspondences. Our method achieves state-of-the-art results both with and without synthetic noisy correspondences on three datasets.
arXiv Detail & Related papers (2023-08-19T05:34:13Z)
Flow Matching in Latent Space [2.9330609943398525]
Flow matching is a framework to train generative models that exhibits impressive empirical performance. We propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency. Our work stands as a pioneering contribution in the integration of various conditions into flow matching for conditional generation tasks.
arXiv Detail & Related papers (2023-07-17T17:57:56Z)
Extracting Semantic Knowledge from GANs with Unsupervised Learning [65.32631025780631]
Generative Adversarial Networks (GANs) encode semantics in feature maps in a linearly separable form. We propose a novel clustering algorithm, named KLiSH, which leverages the linear separability to cluster GAN's features. KLiSH succeeds in extracting fine-grained semantics of GANs trained on datasets of various objects.
arXiv Detail & Related papers (2022-11-30T03:18:16Z)
DeepDC: Deep Distance Correlation as a Perceptual Image Quality Evaluator [53.57431705309919]
ImageNet pre-trained deep neural networks (DNNs) show notable transferability for building effective image quality assessment (IQA) models. We develop a novel full-reference IQA (FR-IQA) model based exclusively on pre-trained DNN features. We conduct comprehensive experiments to demonstrate the superiority of the proposed quality model on five standard IQA datasets.
arXiv Detail & Related papers (2022-11-09T14:57:27Z)
Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets. It considers a retrieval-then-optimization procedure to synthesize pseudo text features. It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z)
OptGAN: Optimizing and Interpreting the Latent Space of the Conditional Text-to-Image GANs [8.26410341981427]
We study how to ensure that generated samples are believable, realistic or natural. We present a novel algorithm which identifies semantically-understandable directions in the latent space of a conditional text-to-image GAN architecture.
arXiv Detail & Related papers (2022-02-25T20:00:33Z)
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models [16.786221846896108]
We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies. We find that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples. Our models can be fine-tuned to perform image inpainting, enabling powerful text-driven image editing.
arXiv Detail & Related papers (2021-12-20T18:42:55Z)
InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model. This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z)
Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [101.97397967958722]
We propose a novel unified framework of Cycle-consistent Inverse GAN for both text-to-image generation and text-guided image manipulation tasks. We learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image. In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes.
arXiv Detail & Related papers (2021-08-03T08:38:16Z)
Knowledge Generation -- Variational Bayes on Knowledge Graphs [0.685316573653194]
This thesis is a proof of concept for potential of Vari Auto-Encoder (VAE) on representation of real-world Knowledge Graphs. Inspired by successful approaches to generation graphs, we evaluate the capabilities of our model, the Variational Auto-Encoder (RGVAE) The RGVAE is first evaluated on link prediction. The mean reciprocal rank (MRR) scores on the two FB15K-237 and WN18RR datasets are compared. We investigate the latent space in a twofold experiment: first, linear between the latent representation of two triples, then the exploration of each
arXiv Detail & Related papers (2021-01-21T21:23:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.