Transforming the Latent Space of StyleGAN for Real Face Editing
- URL: http://arxiv.org/abs/2105.14230v1
- Date: Sat, 29 May 2021 06:42:23 GMT
- Title: Transforming the Latent Space of StyleGAN for Real Face Editing
- Authors: Heyi Li, Jinlong Liu, Yunzhi Bai, Huayan Wang, Klaus Mueller
- Abstract summary: We propose to expand the latent space by replacing fully-connected layers in the StyleGAN's mapping network with attention-based transformers.
This simple and effective technique integrates the aforementioned two spaces and transforms them into one new latent space called $W$++.
Our modified StyleGAN maintains the state-of-the-art generation quality of the original StyleGAN with moderately better diversity.
But more importantly, the proposed $W$++ space achieves superior performance in both reconstruction quality and editing quality.
- Score: 35.93066942205814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent advances in semantic manipulation using StyleGAN, semantic
editing of real faces remains challenging. The gap between the $W$ space and
the $W$+ space demands an undesirable trade-off between reconstruction quality
and editing quality. To solve this problem, we propose to expand the latent
space by replacing fully-connected layers in the StyleGAN's mapping network
with attention-based transformers. This simple and effective technique
integrates the aforementioned two spaces and transforms them into one new
latent space called $W$++. Our modified StyleGAN maintains the state-of-the-art
generation quality of the original StyleGAN with moderately better diversity.
But more importantly, the proposed $W$++ space achieves superior performance in
both reconstruction quality and editing quality. Despite these significant
advantages, our $W$++ space supports existing inversion algorithms and editing
methods with only negligible modifications thanks to its structural similarity
with the $W/W$+ space. Extensive experiments on the FFHQ dataset prove that our
proposed $W$++ space is evidently more preferable than the previous $W/W$+
space for real face editing. The code is publicly available for research
purposes at https://github.com/AnonSubm2021/TransStyleGAN.
Related papers
- FLoRA: Low-Rank Core Space for N-dimension [78.39310274926535]
Adapting pre-trained foundation models for various downstream tasks has been prevalent in artificial intelligence.
To mitigate this, several fine-tuning techniques have been developed to update the pre-trained model weights in a more resource-efficient manner.
This paper introduces a generalized parameter-efficient fine-tuning framework, FLoRA, designed for various dimensional parameter space.
arXiv Detail & Related papers (2024-05-23T16:04:42Z) - StylePrompter: All Styles Need Is Attention [21.760753546313403]
StyleGAN aims at inverting images into corresponding latent codes for Generative Adversarial Networks (GANs)
We introduce a hierarchical vision Transformer backbone innovatively to predict $mathcalW+$ latent codes at token level.
We then prove that StylePrompter lies in a more disentangled $mathcalW+$ and show the controllability of SMART.
arXiv Detail & Related papers (2023-07-30T07:23:44Z) - Revisiting Latent Space of GAN Inversion for Real Image Editing [27.035594402482886]
In this study, we revisit StyleGANs' hyperspherical prior $mathcalZ$ and combine it with highly capable latent spaces to build combined spaces that faithfully invert real images.
We show that $mathcalZ+$ can replace the most commonly-used $mathcalW$, $mathcalW+$, and $mathcalS$ spaces while preserving reconstruction quality, resulting in reduced distortion of edited images.
arXiv Detail & Related papers (2023-07-18T06:27:44Z) - Balancing Reconstruction and Editing Quality of GAN Inversion for Real
Image Editing with StyleGAN Prior Latent Space [27.035594402482886]
We revisit StyleGANs' hyperspherical prior $mathcalZ$ and $mathcalZ+$ and integrate them into seminal GAN inversion methods to improve editing quality.
Our extensions achieve sophisticated editing quality with the aid of the StyleGAN prior.
arXiv Detail & Related papers (2023-05-31T23:27:07Z) - Make It So: Steering StyleGAN for Any Image Inversion and Editing [16.337519991964367]
StyleGAN's disentangled style representation enables powerful image editing by manipulating the latent variables.
Existing GAN inversion methods struggle to maintain editing directions and produce realistic results.
We propose Make It So, a novel GAN inversion method that operates in the $mathcalZ$ (noise) space rather than the typical $mathcalW$ (latent style) space.
arXiv Detail & Related papers (2023-04-27T17:59:24Z) - P+: Extended Textual Conditioning in Text-to-Image Generation [50.823884280133626]
We introduce an Extended Textual Conditioning space in text-to-image models, referred to as $P+$.
We show that the extended space provides greater disentangling and control over image synthesis.
We further introduce Extended Textual Inversion (XTI), where the images are inverted into $P+$, and represented by per-layer tokens.
arXiv Detail & Related papers (2023-03-16T17:38:15Z) - Towards Arbitrary Text-driven Image Manipulation via Space Alignment [49.3370305074319]
We propose a new Text-driven image Manipulation framework via Space Alignment (TMSA)
TMSA aims to align the same semantic regions in CLIP and StyleGAN spaces.
The framework can support arbitrary image editing mode without additional cost.
arXiv Detail & Related papers (2023-01-25T16:20:01Z) - Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space
Viewpoint [76.00222741383375]
GAN inversion and editing via StyleGAN maps an input image into the embedding spaces ($mathcalW$, $mathcalW+$, and $mathcalF$) to simultaneously maintain image fidelity and meaningful manipulation.
Recent GAN inversion methods typically explore $mathcalW+$ and $mathcalF$ rather than $mathcalW$ to improve reconstruction fidelity while maintaining editability.
We introduce contrastive learning to align $mathcalW$ and the image space for precise latent
arXiv Detail & Related papers (2022-11-21T13:35:32Z) - TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable
Facial Editing [110.82128064489237]
We propose TransEditor, a novel Transformer-based framework to enhance interaction in a dual-space GAN for more controllable editing.
Experiments demonstrate the superiority of the proposed framework in image quality and editing capability, suggesting the effectiveness of TransEditor for highly controllable facial editing.
arXiv Detail & Related papers (2022-03-31T17:58:13Z) - HyperInverter: Improving StyleGAN Inversion via Hypernetwork [12.173568611144628]
Current GAN inversion methods fail to meet at least one of the three requirements listed below: high reconstruction quality, editability, and fast inference.
We present a novel two-phase strategy in this research that fits all requirements at the same time.
Our method is entirely encoder-based, resulting in extremely fast inference.
arXiv Detail & Related papers (2021-12-01T18:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.