Latents2Semantics: Leveraging the Latent Space of Generative Models for
Localized Style Manipulation of Face Images
- URL: http://arxiv.org/abs/2312.15037v1
- Date: Fri, 22 Dec 2023 20:06:53 GMT
- Title: Latents2Semantics: Leveraging the Latent Space of Generative Models for
Localized Style Manipulation of Face Images
- Authors: Snehal Singh Tomar, A.N. Rajagopalan
- Abstract summary: We introduce the Latents2Semantics Autoencoder (L2SAE), a Generative Autoencoder model that facilitates localized editing of style attributes of several Regions of Interest in face images.
The L2SAE learns separate latent representations for encoded images' structure and style information, allowing for structure-preserving style editing of the chosen ROIs.
We provide qualitative and quantitative results for the same over multiple applications, such as selective style editing and swapping using test images sampled from several datasets.
- Score: 25.82631308991067
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the metaverse slowly becoming a reality and given the rapid pace of
developments toward the creation of digital humans, the need for a principled
style editing pipeline for human faces is bound to increase manifold. We cater
to this need by introducing the Latents2Semantics Autoencoder (L2SAE), a
Generative Autoencoder model that facilitates highly localized editing of style
attributes of several Regions of Interest (ROIs) in face images. The L2SAE
learns separate latent representations for encoded images' structure and style
information. Thus, allowing for structure-preserving style editing of the
chosen ROIs. The encoded structure representation is a multichannel 2D tensor
with reduced spatial dimensions, which captures both local and global structure
properties. The style representation is a 1D tensor that captures global style
attributes. In our framework, we slice the structure representation to build
strong and disentangled correspondences with different ROIs. Consequentially,
style editing of the chosen ROIs amounts to a simple combination of (a) the
ROI-mask generated from the sliced structure representation and (b) the decoded
image with global style changes, generated from the manipulated (using Gaussian
noise) global style and unchanged structure tensor. Style editing sans
additional human supervision is a significant win over SOTA style editing
pipelines because most existing works require additional human effort
(supervision) post-training for attributing semantic meaning to style edits. We
also do away with iterative-optimization-based inversion or determining
controllable latent directions post-training, which requires additional
computationally expensive operations. We provide qualitative and quantitative
results for the same over multiple applications, such as selective style
editing and swapping using test images sampled from several datasets.
Related papers
- ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - Does resistance to style-transfer equal Global Shape Bias? Measuring
network sensitivity to global shape configuration [6.047146237332764]
Current benchmark for evaluating a model's global shape bias is a set of style-transferred images.
We show that networks trained with style-transfer images indeed learn to ignore style, but its shape bias arises primarily from local detail.
arXiv Detail & Related papers (2023-10-11T15:00:11Z) - Semantic Image Synthesis via Class-Adaptive Cross-Attention [7.147779225315707]
Cross-attention layers are used in place of SPADE for learning shape-style correlations and so conditioning the image generation process.
Our model inherits the versatility of SPADE, at the same time obtaining state-of-the-art generation quality, as well as improved global and local style transfer.
arXiv Detail & Related papers (2023-08-30T14:49:34Z) - SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form
Layout-to-Image Generation [68.42476385214785]
We propose a novel Spatial-Semantic Map Guided (SSMG) diffusion model that adopts the feature map, derived from the layout, as guidance.
SSMG achieves superior generation quality with sufficient spatial and semantic controllability compared to previous works.
We also propose the Relation-Sensitive Attention (RSA) and Location-Sensitive Attention (LSA) mechanisms.
arXiv Detail & Related papers (2023-08-20T04:09:12Z) - Spectral Normalization and Dual Contrastive Regularization for
Image-to-Image Translation [9.029227024451506]
We propose a new unpaired I2I translation framework based on dual contrastive regularization and spectral normalization.
We conduct comprehensive experiments to evaluate the effectiveness of SN-DCR, and the results prove that our method achieves SOTA in multiple tasks.
arXiv Detail & Related papers (2023-04-22T05:22:24Z) - Efficient and Explicit Modelling of Image Hierarchies for Image
Restoration [120.35246456398738]
We propose a mechanism to efficiently and explicitly model image hierarchies in the global, regional, and local range for image restoration.
Inspired by that, we propose the anchored stripe self-attention which achieves a good balance between the space and time complexity of self-attention.
Then we propose a new network architecture dubbed GRL to explicitly model image hierarchies in the Global, Regional, and Local range.
arXiv Detail & Related papers (2023-03-01T18:59:29Z) - Hierarchical Semantic Regularization of Latent Spaces in StyleGANs [53.98170188547775]
We propose a Hierarchical Semantic Regularizer (HSR) which aligns the hierarchical representations learnt by the generator to corresponding powerful features learnt by pretrained networks on large amounts of data.
HSR is shown to not only improve generator representations but also the linearity and smoothness of the latent style spaces, leading to the generation of more natural-looking style-edited images.
arXiv Detail & Related papers (2022-08-07T16:23:33Z) - Everything is There in Latent Space: Attribute Editing and Attribute
Style Manipulation by StyleGAN Latent Space Exploration [39.18239951479647]
We present Few-shot Latent-based Attribute Manipulation and Editing (FLAME)
FLAME is a framework to perform highly controlled image editing by latent space manipulation.
We generate diverse attribute styles in disentangled manner.
arXiv Detail & Related papers (2022-07-20T12:40:32Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval [119.03470556503942]
Crossmodal matching problem is typically solved by learning a joint embedding space where semantic content shared between photo and sketch modalities are preserved.
An effective model needs to explicitly account for this style diversity, crucially, to unseen user styles.
Our model can not only disentangle the cross-modal shared semantic content, but can adapt the disentanglement to any unseen user style as well, making the model truly agnostic.
arXiv Detail & Related papers (2021-03-29T15:44:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.