Hierarchical Semantic Regularization of Latent Spaces in StyleGANs
- URL: http://arxiv.org/abs/2208.03764v1
- Date: Sun, 7 Aug 2022 16:23:33 GMT
- Title: Hierarchical Semantic Regularization of Latent Spaces in StyleGANs
- Authors: Tejan Karmali, Rishubh Parihar, Susmit Agrawal, Harsh Rangwani, Varun
Jampani, Maneesh Singh, R. Venkatesh Babu
- Abstract summary: We propose a Hierarchical Semantic Regularizer (HSR) which aligns the hierarchical representations learnt by the generator to corresponding powerful features learnt by pretrained networks on large amounts of data.
HSR is shown to not only improve generator representations but also the linearity and smoothness of the latent style spaces, leading to the generation of more natural-looking style-edited images.
- Score: 53.98170188547775
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Progress in GANs has enabled the generation of high-resolution photorealistic
images of astonishing quality. StyleGANs allow for compelling attribute
modification on such images via mathematical operations on the latent style
vectors in the W/W+ space that effectively modulate the rich hierarchical
representations of the generator. Such operations have recently been
generalized beyond mere attribute swapping in the original StyleGAN paper to
include interpolations. In spite of many significant improvements in StyleGANs,
they are still seen to generate unnatural images. The quality of the generated
images is predicated on two assumptions; (a) The richness of the hierarchical
representations learnt by the generator, and, (b) The linearity and smoothness
of the style spaces. In this work, we propose a Hierarchical Semantic
Regularizer (HSR) which aligns the hierarchical representations learnt by the
generator to corresponding powerful features learnt by pretrained networks on
large amounts of data. HSR is shown to not only improve generator
representations but also the linearity and smoothness of the latent style
spaces, leading to the generation of more natural-looking style-edited images.
To demonstrate improved linearity, we propose a novel metric - Attribute
Linearity Score (ALS). A significant reduction in the generation of unnatural
images is corroborated by improvement in the Perceptual Path Length (PPL)
metric by 16.19% averaged across different standard datasets while
simultaneously improving the linearity of attribute-change in the attribute
editing tasks.
Related papers
- A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Latents2Semantics: Leveraging the Latent Space of Generative Models for
Localized Style Manipulation of Face Images [25.82631308991067]
We introduce the Latents2Semantics Autoencoder (L2SAE), a Generative Autoencoder model that facilitates localized editing of style attributes of several Regions of Interest in face images.
The L2SAE learns separate latent representations for encoded images' structure and style information, allowing for structure-preserving style editing of the chosen ROIs.
We provide qualitative and quantitative results for the same over multiple applications, such as selective style editing and swapping using test images sampled from several datasets.
arXiv Detail & Related papers (2023-12-22T20:06:53Z) - Semantic Image Synthesis via Class-Adaptive Cross-Attention [7.147779225315707]
Cross-attention layers are used in place of SPADE for learning shape-style correlations and so conditioning the image generation process.
Our model inherits the versatility of SPADE, at the same time obtaining state-of-the-art generation quality, as well as improved global and local style transfer.
arXiv Detail & Related papers (2023-08-30T14:49:34Z) - Latent Multi-Relation Reasoning for GAN-Prior based Image
Super-Resolution [61.65012981435095]
LAREN is a graph-based disentanglement that constructs a superior disentangled latent space via hierarchical multi-relation reasoning.
We show that LAREN achieves superior large-factor image SR and outperforms the state-of-the-art consistently across multiple benchmarks.
arXiv Detail & Related papers (2022-08-04T19:45:21Z) - Everything is There in Latent Space: Attribute Editing and Attribute
Style Manipulation by StyleGAN Latent Space Exploration [39.18239951479647]
We present Few-shot Latent-based Attribute Manipulation and Editing (FLAME)
FLAME is a framework to perform highly controlled image editing by latent space manipulation.
We generate diverse attribute styles in disentangled manner.
arXiv Detail & Related papers (2022-07-20T12:40:32Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - Latent Transformations via NeuralODEs for GAN-based Image Editing [25.272389610447856]
We show that nonlinear latent code manipulations realized as flows of a trainable Neural ODE are beneficial for many practical non-face image domains.
In particular, we investigate a large number of datasets with known attributes and demonstrate that certain attribute manipulations are challenging to obtain with linear shifts only.
arXiv Detail & Related papers (2021-11-29T18:59:54Z) - StyleGAN-induced data-driven regularization for inverse problems [2.5138572116292686]
Recent advances in generative adversarial networks (GANs) have opened up the possibility of generating high-resolution images that were impossible to produce previously.
We develop a framework that utilizes the full potential of a pre-trained StyleGAN2 generator for constructing the prior distribution on the underlying image.
Considering the inverse problems of image inpainting and super-resolution, we demonstrate that the proposed approach is competitive with, and sometimes superior to, state-of-the-art GAN-based image reconstruction methods.
arXiv Detail & Related papers (2021-10-07T22:25:30Z) - Style Intervention: How to Achieve Spatial Disentanglement with
Style-based Generators? [100.60938767993088]
We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives.
We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
arXiv Detail & Related papers (2020-11-19T07:37:31Z) - Generative Hierarchical Features from Synthesizing Images [65.66756821069124]
We show that learning to synthesize images can bring remarkable hierarchical visual features that are generalizable across a wide range of applications.
The visual feature produced by our encoder, termed as Generative Hierarchical Feature (GH-Feat), has strong transferability to both generative and discriminative tasks.
arXiv Detail & Related papers (2020-07-20T18:04:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.