Learning a Deep Reinforcement Learning Policy Over the Latent Space of a
Pre-trained GAN for Semantic Age Manipulation
- URL: http://arxiv.org/abs/2011.00954v2
- Date: Wed, 28 Apr 2021 09:19:48 GMT
- Title: Learning a Deep Reinforcement Learning Policy Over the Latent Space of a
Pre-trained GAN for Semantic Age Manipulation
- Authors: Kumar Shubham, Gopalakrishnan Venkatesh, Reijul Sachdev, Akshi, Dinesh
Babu Jayagopi, G. Srinivasaraghavan
- Abstract summary: We learn a conditional policy for semantic manipulation along specific attributes under defined identity bounds.
Results show that our learned policy samples high fidelity images with required age alterations.
- Score: 4.306143768014157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning a disentangled representation of the latent space has become one of
the most fundamental problems studied in computer vision. Recently, many
Generative Adversarial Networks (GANs) have shown promising results in
generating high fidelity images. However, studies to understand the semantic
layout of the latent space of pre-trained models are still limited. Several
works train conditional GANs to generate faces with required semantic
attributes. Unfortunately, in these attempts, the generated output is often not
as photo-realistic as the unconditional state-of-the-art models. Besides, they
also require large computational resources and specific datasets to generate
high fidelity images. In our work, we have formulated a Markov Decision Process
(MDP) over the latent space of a pre-trained GAN model to learn a conditional
policy for semantic manipulation along specific attributes under defined
identity bounds. Further, we have defined a semantic age manipulation scheme
using a locally linear approximation over the latent space. Results show that
our learned policy samples high fidelity images with required age alterations,
while preserving the identity of the person.
Related papers
- Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection.
CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z) - Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models [36.59260354292177]
Recent advancements in text-to-image generation have inspired researchers to generate datasets tailored for perception models using generative models.
We aim to fine-tune vision-language models to a specific classification model without access to any real images.
Despite the high fidelity of generated images, we observed a significant performance degradation when fine-tuning the model using the generated datasets.
arXiv Detail & Related papers (2024-06-08T10:43:49Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - A 3D GAN for Improved Large-pose Facial Recognition [3.791440300377753]
Facial recognition using deep convolutional neural networks relies on the availability of large datasets of face images.
Recent studies have shown that current methods of disentangling pose from identity are inadequate.
In this work we incorporate a 3D morphable model into the generator of a GAN in order to learn a nonlinear texture model from in-the-wild images.
This allows generation of new, synthetic identities, and manipulation of pose, illumination and expression without compromising the identity.
arXiv Detail & Related papers (2020-12-18T22:41:15Z) - Evidential Sparsification of Multimodal Latent Spaces in Conditional
Variational Autoencoders [63.46738617561255]
We consider the problem of sparsifying the discrete latent space of a trained conditional variational autoencoder.
We use evidential theory to identify the latent classes that receive direct evidence from a particular input condition and filter out those that do not.
Experiments on diverse tasks, such as image generation and human behavior prediction, demonstrate the effectiveness of our proposed technique.
arXiv Detail & Related papers (2020-10-19T01:27:21Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z) - InterFaceGAN: Interpreting the Disentangled Face Representation Learned
by GANs [73.27299786083424]
We propose a framework called InterFaceGAN to interpret the disentangled face representation learned by state-of-the-art GAN models.
We first find that GANs learn various semantics in some linear subspaces of the latent space.
We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection.
arXiv Detail & Related papers (2020-05-18T18:01:22Z) - On Leveraging Pretrained GANs for Generation with Limited Data [83.32972353800633]
generative adversarial networks (GANs) can generate highly realistic images, that are often indistinguishable (by humans) from real images.
Most images so generated are not contained in a training dataset, suggesting potential for augmenting training sets with GAN-generated data.
We leverage existing GAN models pretrained on large-scale datasets to introduce additional knowledge, following the concept of transfer learning.
An extensive set of experiments is presented to demonstrate the effectiveness of the proposed techniques on generation with limited data.
arXiv Detail & Related papers (2020-02-26T21:53:36Z) - Controlling generative models with continuous factors of variations [1.7188280334580197]
We introduce a new method to find meaningful directions in the latent space of any generative model.
Our method does not require human annotations and is well suited for the search of directions encoding simple transformations of the generated image.
arXiv Detail & Related papers (2020-01-28T10:04:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.