Pivotal Tuning for Latent-based Editing of Real Images
- URL: http://arxiv.org/abs/2106.05744v1
- Date: Thu, 10 Jun 2021 13:47:59 GMT
- Title: Pivotal Tuning for Latent-based Editing of Real Images
- Authors: Daniel Roich, Ron Mokady, Amit H. Bermano, and Daniel Cohen-Or
- Abstract summary: A surge of advanced facial editing techniques have been proposed that leverage the generative power of a pre-trained StyleGAN.
To successfully edit an image this way, one must first project (or invert) the image into the pre-trained generator's domain.
This means it is still challenging to apply ID-preserving facial latent-space editing to faces which are out of the generator's domain.
- Score: 40.22151052441958
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recently, a surge of advanced facial editing techniques have been proposed
that leverage the generative power of a pre-trained StyleGAN. To successfully
edit an image this way, one must first project (or invert) the image into the
pre-trained generator's domain. As it turns out, however, StyleGAN's latent
space induces an inherent tradeoff between distortion and editability, i.e.
between maintaining the original appearance and convincingly altering some of
its attributes. Practically, this means it is still challenging to apply
ID-preserving facial latent-space editing to faces which are out of the
generator's domain. In this paper, we present an approach to bridge this gap.
Our technique slightly alters the generator, so that an out-of-domain image is
faithfully mapped into an in-domain latent code. The key idea is pivotal tuning
- a brief training process that preserves the editing quality of an in-domain
latent region, while changing its portrayed identity and appearance. In Pivotal
Tuning Inversion (PTI), an initial inverted latent code serves as a pivot,
around which the generator is fined-tuned. At the same time, a regularization
term keeps nearby identities intact, to locally contain the effect. This
surgical training process ends up altering appearance features that represent
mostly identity, without affecting editing capabilities. We validate our
technique through inversion and editing metrics, and show preferable scores to
state-of-the-art methods. We further qualitatively demonstrate our technique by
applying advanced edits (such as pose, age, or expression) to numerous images
of well-known and recognizable identities. Finally, we demonstrate resilience
to harder cases, including heavy make-up, elaborate hairstyles and/or headwear,
which otherwise could not have been successfully inverted and edited by
state-of-the-art methods.
Related papers
- HyperEditor: Achieving Both Authenticity and Cross-Domain Capability in
Image Editing via Hypernetworks [5.9189325968909365]
We propose an innovative image editing method called HyperEditor, which utilizes weight factors generated by hypernetworks to reassign the weights of the pre-trained StyleGAN2's generator.
Guided by CLIP's cross-modal image-text semantic alignment, this innovative approach enables us to simultaneously accomplish authentic attribute editing and cross-domain style transfer.
arXiv Detail & Related papers (2023-12-21T02:39:53Z) - Gradient Adjusting Networks for Domain Inversion [82.72289618025084]
StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing.
We present a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator's weights.
Our experiments show a sizable gap in performance over the current state of the art in this very active domain.
arXiv Detail & Related papers (2023-02-22T14:47:57Z) - Semantic Unfolding of StyleGAN Latent Space [0.7646713951724012]
Generative adversarial networks (GANs) have proven to be surprisingly efficient for image editing by inverting and manipulating the latent code corresponding to an input real image.
This editing property emerges from the disentangled nature of the latent space.
In this paper, we identify that the facial attribute disentanglement is not optimal, thus facial editing relying on linear attribute separation is flawed.
arXiv Detail & Related papers (2022-06-29T20:22:10Z) - Expanding the Latent Space of StyleGAN for Real Face Editing [4.1715767752637145]
A surge of face editing techniques have been proposed to employ the pretrained StyleGAN for semantic manipulation.
To successfully edit a real image, one must first convert the input image into StyleGAN's latent variables.
We present a method to expand the latent space of StyleGAN with additional content features to break down the trade-off between low-distortion and high-editability.
arXiv Detail & Related papers (2022-04-26T18:27:53Z) - FEAT: Face Editing with Attention [70.89233432407305]
We build on the StyleGAN generator and present a method that explicitly encourages face manipulation to focus on the intended regions.
During the generation of the edited image, the attention map serves as a mask that guides a blending between the original features and the modified ones.
arXiv Detail & Related papers (2022-02-06T06:07:34Z) - Pixel Sampling for Style Preserving Face Pose Editing [53.14006941396712]
We present a novel two-stage approach to solve the dilemma, where the task of face pose manipulation is cast into face inpainting.
By selectively sampling pixels from the input face and slightly adjust their relative locations, the face editing result faithfully keeps the identity information as well as the image style unchanged.
With the 3D facial landmarks as guidance, our method is able to manipulate face pose in three degrees of freedom, i.e., yaw, pitch, and roll, resulting in more flexible face pose editing.
arXiv Detail & Related papers (2021-06-14T11:29:29Z) - Designing an Encoder for StyleGAN Image Manipulation [38.909059126878354]
We study the latent space of StyleGAN, the state-of-the-art unconditional generator.
We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space.
We present an encoder based on our two principles that is specifically designed for facilitating editing on real images.
arXiv Detail & Related papers (2021-02-04T17:52:38Z) - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z) - In-Domain GAN Inversion for Real Image Editing [56.924323432048304]
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.
Existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space.
We propose an in-domain GAN inversion approach, which faithfully reconstructs the input image and ensures the inverted code to be semantically meaningful for editing.
arXiv Detail & Related papers (2020-03-31T18:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.