PIE: Portrait Image Embedding for Semantic Control
- URL: http://arxiv.org/abs/2009.09485v1
- Date: Sun, 20 Sep 2020 17:53:51 GMT
- Title: PIE: Portrait Image Embedding for Semantic Control
- Authors: Ayush Tewari, Mohamed Elgharib, Mallikarjun B R., Florian Bernard,
Hans-Peter Seidel, Patrick P\'erez, Michael Zollh\"ofer, Christian Theobalt
- Abstract summary: We present the first approach for embedding real portrait images in the latent space of StyleGAN.
We use StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN.
An identity energy preservation term allows spatially coherent edits while maintaining facial integrity.
- Score: 82.69061225574774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Editing of portrait images is a very popular and important research topic
with a large variety of applications. For ease of use, control should be
provided via a semantically meaningful parameterization that is akin to
computer animation controls. The vast majority of existing techniques do not
provide such intuitive and fine-grained control, or only enable coarse editing
of a single isolated control parameter. Very recently, high-quality
semantically controlled editing has been demonstrated, however only on
synthetically created StyleGAN images. We present the first approach for
embedding real portrait images in the latent space of StyleGAN, which allows
for intuitive editing of the head pose, facial expression, and scene
illumination in the image. Semantic editing in parameter space is achieved
based on StyleRig, a pretrained neural network that maps the control space of a
3D morphable face model to the latent space of the GAN. We design a novel
hierarchical non-linear optimization problem to obtain the embedding. An
identity preservation energy term allows spatially coherent edits while
maintaining facial integrity. Our approach runs at interactive frame rates and
thus allows the user to explore the space of possible edits. We evaluate our
approach on a wide set of portrait photos, compare it to the current state of
the art, and validate the effectiveness of its components in an ablation study.
Related papers
- DisControlFace: Adding Disentangled Control to Diffusion Autoencoder for One-shot Explicit Facial Image Editing [14.537856326925178]
We focus on exploring explicit fine-grained control of generative facial image editing.
We propose a novel diffusion-based editing framework, named DisControlFace.
Our model can be trained using 2D in-the-wild portrait images without requiring 3D or video data.
arXiv Detail & Related papers (2023-12-11T08:16:55Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.
First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space.
We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z) - FreeStyleGAN: Free-view Editable Portrait Rendering with the Camera
Manifold [5.462226912969161]
Current Generative Adversarial Networks (GANs) produce photorealistic renderings of portrait images.
We show how our approach enables the integration of a pre-trained StyleGAN into standard 3D rendering pipelines.
Our solution proposes the first truly free-viewpoint rendering of realistic faces at interactive rates.
arXiv Detail & Related papers (2021-09-20T08:59:21Z) - Pixel Sampling for Style Preserving Face Pose Editing [53.14006941396712]
We present a novel two-stage approach to solve the dilemma, where the task of face pose manipulation is cast into face inpainting.
By selectively sampling pixels from the input face and slightly adjust their relative locations, the face editing result faithfully keeps the identity information as well as the image style unchanged.
With the 3D facial landmarks as guidance, our method is able to manipulate face pose in three degrees of freedom, i.e., yaw, pitch, and roll, resulting in more flexible face pose editing.
arXiv Detail & Related papers (2021-06-14T11:29:29Z) - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z) - Look here! A parametric learning based approach to redirect visual
attention [49.609412873346386]
We introduce an automatic method to make an image region more attention-capturing via subtle image edits.
Our model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions.
Our edits enable inference at interactive rates on any image size, and easily generalize to videos.
arXiv Detail & Related papers (2020-08-12T16:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.