Mask-Guided Discovery of Semantic Manifolds in Generative Models
- URL: http://arxiv.org/abs/2105.07273v1
- Date: Sat, 15 May 2021 18:06:38 GMT
- Title: Mask-Guided Discovery of Semantic Manifolds in Generative Models
- Authors: Mengyu Yang, David Rokeby, Xavier Snelgrove
- Abstract summary: StyleGAN2 generates images of human faces from random vectors in a lower-dimensional latent space.
The model behaves as a black box, providing neither control over its output nor insight into the structures it has learned from the data.
We present a method to explore the manifold of changes of spatially localized regions of the face.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in the realm of Generative Adversarial Networks (GANs) have led to
architectures capable of producing amazingly realistic images such as
StyleGAN2, which, when trained on the FFHQ dataset, generates images of human
faces from random vectors in a lower-dimensional latent space. Unfortunately,
this space is entangled - translating a latent vector along its axes does not
correspond to a meaningful transformation in the output space (e.g., smiling
mouth, squinting eyes). The model behaves as a black box, providing neither
control over its output nor insight into the structures it has learned from the
data. We present a method to explore the manifolds of changes of spatially
localized regions of the face. Our method discovers smoothly varying sequences
of latent vectors along these manifolds suitable for creating animations.
Unlike existing disentanglement methods that either require labelled data or
explicitly alter internal model parameters, our method is an optimization-based
approach guided by a custom loss function and manually defined region of
change. Our code is open-sourced, which can be found, along with supplementary
results, on our project page: https://github.com/bmolab/masked-gan-manifold
Related papers
- DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control [68.14798033899955]
Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content.
However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation?
We investigate this question in the context of autonomous driving, and answer it with a resounding "yes"
arXiv Detail & Related papers (2023-12-05T18:34:12Z) - Pre-training with Random Orthogonal Projection Image Modeling [32.667183132025094]
Masked Image Modeling (MIM) is a powerful self-supervised strategy for visual pre-training without the use of labels.
We propose an Image Modeling framework based on Random Orthogonal Projection Image Modeling (ROPIM)
ROPIM reduces spatially-wise token information under guaranteed bound on the noise variance and can be considered as masking entire spatial image area under locally varying masking degrees.
arXiv Detail & Related papers (2023-10-28T15:42:07Z) - Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models [21.173910627285338]
Denoising Diffusion Models (DDMs) have emerged as a strong competitor to Generative Adversarial Networks (GANs)
In this paper, we explore the properties of h-space and propose several novel methods for finding meaningful semantic directions within it.
Our approaches are applicable without requiring architectural modifications, text-based guidance, CLIP-based optimization, or model fine-tuning.
arXiv Detail & Related papers (2023-03-20T12:59:32Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications.
We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z) - Orthogonal Jacobian Regularization for Unsupervised Disentanglement in
Image Generation [64.92152574895111]
We propose a simple Orthogonal Jacobian Regularization (OroJaR) to encourage deep generative model to learn disentangled representations.
Our method is effective in disentangled and controllable image generation, and performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2021-08-17T15:01:46Z) - StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for
Real-time Image Editing [19.495153059077367]
Generative adversarial networks (GANs) synthesize realistic images from random latent vectors.
Editing real images with GANs suffers from i) time-consuming optimization for projecting real images to the latent vectors, ii) or inaccurate embedding through an encoder.
We propose StyleMapGAN: the intermediate latent space has spatial dimensions, and a spatially variant replaces AdaIN.
arXiv Detail & Related papers (2021-04-30T04:43:24Z) - Do Generative Models Know Disentanglement? Contrastive Learning is All
You Need [59.033559925639075]
We propose an unsupervised and model-agnostic method: Disentanglement via Contrast (DisCo) in the Variation Space.
DisCo achieves the state-of-the-art disentanglement given pretrained non-disentangled generative models, including GAN, VAE, and Flow.
arXiv Detail & Related papers (2021-02-21T08:01:20Z) - The Geometry of Deep Generative Image Models and its Applications [0.0]
Generative adversarial networks (GANs) have emerged as a powerful unsupervised method to model the statistical patterns of real-world data sets.
These networks are trained to map random inputs in their latent space to new samples representative of the learned data.
The structure of the latent space is hard to intuit due to its high dimensionality and the non-linearity of the generator.
arXiv Detail & Related papers (2021-01-15T07:57:33Z) - Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications.
We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.