Revisiting Latent Space of GAN Inversion for Real Image Editing
- URL: http://arxiv.org/abs/2307.08995v1
- Date: Tue, 18 Jul 2023 06:27:44 GMT
- Title: Revisiting Latent Space of GAN Inversion for Real Image Editing
- Authors: Kai Katsumata, Duc Minh Vo, Bei Liu, Hideki Nakayama
- Abstract summary: In this study, we revisit StyleGANs' hyperspherical prior $mathcalZ$ and combine it with highly capable latent spaces to build combined spaces that faithfully invert real images.
We show that $mathcalZ+$ can replace the most commonly-used $mathcalW$, $mathcalW+$, and $mathcalS$ spaces while preserving reconstruction quality, resulting in reduced distortion of edited images.
- Score: 27.035594402482886
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The exploration of the latent space in StyleGANs and GAN inversion exemplify
impressive real-world image editing, yet the trade-off between reconstruction
quality and editing quality remains an open problem. In this study, we revisit
StyleGANs' hyperspherical prior $\mathcal{Z}$ and combine it with highly
capable latent spaces to build combined spaces that faithfully invert real
images while maintaining the quality of edited images. More specifically, we
propose $\mathcal{F}/\mathcal{Z}^{+}$ space consisting of two subspaces:
$\mathcal{F}$ space of an intermediate feature map of StyleGANs enabling
faithful reconstruction and $\mathcal{Z}^{+}$ space of an extended StyleGAN
prior supporting high editing quality. We project the real images into the
proposed space to obtain the inverted codes, by which we then move along
$\mathcal{Z}^{+}$, enabling semantic editing without sacrificing image quality.
Comprehensive experiments show that $\mathcal{Z}^{+}$ can replace the most
commonly-used $\mathcal{W}$, $\mathcal{W}^{+}$, and $\mathcal{S}$ spaces while
preserving reconstruction quality, resulting in reduced distortion of edited
images.
Related papers
- Designing a Better Asymmetric VQGAN for StableDiffusion [73.21783102003398]
A revolutionary text-to-image generator, StableDiffusion, learns a diffusion model in the latent space via a VQGAN.
We propose a new asymmetric VQGAN with two simple designs.
It can be widely used in StableDiffusion-based inpainting and local editing methods.
arXiv Detail & Related papers (2023-06-07T17:56:02Z) - Balancing Reconstruction and Editing Quality of GAN Inversion for Real
Image Editing with StyleGAN Prior Latent Space [27.035594402482886]
We revisit StyleGANs' hyperspherical prior $mathcalZ$ and $mathcalZ+$ and integrate them into seminal GAN inversion methods to improve editing quality.
Our extensions achieve sophisticated editing quality with the aid of the StyleGAN prior.
arXiv Detail & Related papers (2023-05-31T23:27:07Z) - Make It So: Steering StyleGAN for Any Image Inversion and Editing [16.337519991964367]
StyleGAN's disentangled style representation enables powerful image editing by manipulating the latent variables.
Existing GAN inversion methods struggle to maintain editing directions and produce realistic results.
We propose Make It So, a novel GAN inversion method that operates in the $mathcalZ$ (noise) space rather than the typical $mathcalW$ (latent style) space.
arXiv Detail & Related papers (2023-04-27T17:59:24Z) - P+: Extended Textual Conditioning in Text-to-Image Generation [50.823884280133626]
We introduce an Extended Textual Conditioning space in text-to-image models, referred to as $P+$.
We show that the extended space provides greater disentangling and control over image synthesis.
We further introduce Extended Textual Inversion (XTI), where the images are inverted into $P+$, and represented by per-layer tokens.
arXiv Detail & Related papers (2023-03-16T17:38:15Z) - Towards Arbitrary Text-driven Image Manipulation via Space Alignment [49.3370305074319]
We propose a new Text-driven image Manipulation framework via Space Alignment (TMSA)
TMSA aims to align the same semantic regions in CLIP and StyleGAN spaces.
The framework can support arbitrary image editing mode without additional cost.
arXiv Detail & Related papers (2023-01-25T16:20:01Z) - Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space
Viewpoint [76.00222741383375]
GAN inversion and editing via StyleGAN maps an input image into the embedding spaces ($mathcalW$, $mathcalW+$, and $mathcalF$) to simultaneously maintain image fidelity and meaningful manipulation.
Recent GAN inversion methods typically explore $mathcalW+$ and $mathcalF$ rather than $mathcalW$ to improve reconstruction fidelity while maintaining editability.
We introduce contrastive learning to align $mathcalW$ and the image space for precise latent
arXiv Detail & Related papers (2022-11-21T13:35:32Z) - Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing [57.46189236379433]
We propose a new method to invert and edit complex images in the latent space of GANs, such as StyleGAN2.
Our key idea is to explore inversion with a collection of layers, spatially adapting the inversion process to the difficulty of the image.
arXiv Detail & Related papers (2022-06-16T17:57:49Z) - Transforming the Latent Space of StyleGAN for Real Face Editing [35.93066942205814]
We propose to expand the latent space by replacing fully-connected layers in the StyleGAN's mapping network with attention-based transformers.
This simple and effective technique integrates the aforementioned two spaces and transforms them into one new latent space called $W$++.
Our modified StyleGAN maintains the state-of-the-art generation quality of the original StyleGAN with moderately better diversity.
But more importantly, the proposed $W$++ space achieves superior performance in both reconstruction quality and editing quality.
arXiv Detail & Related papers (2021-05-29T06:42:23Z) - Bridging Unpaired Facial Photos And Sketches By Line-drawings [5.589846737887013]
We propose a novel method to learn face sketch synthesis models by using unpaired data.
We map both photos and sketches to line-drawings by using a neural style transfer method.
Experimental results demonstrate that sRender can generate multi-style sketches, and significantly outperforms existing unpaired image-to-image translation methods.
arXiv Detail & Related papers (2021-02-01T04:51:46Z) - In-Domain GAN Inversion for Real Image Editing [56.924323432048304]
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.
Existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space.
We propose an in-domain GAN inversion approach, which faithfully reconstructs the input image and ensures the inverted code to be semantically meaningful for editing.
arXiv Detail & Related papers (2020-03-31T18:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.