Video2StyleGAN: Disentangling Local and Global Variations in a Video
- URL: http://arxiv.org/abs/2205.13996v2
- Date: Mon, 30 May 2022 20:45:40 GMT
- Title: Video2StyleGAN: Disentangling Local and Global Variations in a Video
- Authors: Rameen Abdal, Peihao Zhu, Niloy J. Mitra, Peter Wonka
- Abstract summary: StyleGAN has emerged as a powerful paradigm for facial editing, providing disentangled controls over age, expression, illumination, etc.
We introduce Video2StyleGAN that takes a target image and driving video(s) to reenact the local and global locations and expressions from the driving video in the identity of the target image.
- Score: 68.70889857355678
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image editing using a pretrained StyleGAN generator has emerged as a powerful
paradigm for facial editing, providing disentangled controls over age,
expression, illumination, etc. However, the approach cannot be directly adopted
for video manipulations. We hypothesize that the main missing ingredient is the
lack of fine-grained and disentangled control over face location, face pose,
and local facial expressions. In this work, we demonstrate that such a
fine-grained control is indeed achievable using pretrained StyleGAN by working
across multiple (latent) spaces (namely, the positional space, the W+ space,
and the S space) and combining the optimization results across the multiple
spaces. Building on this enabling component, we introduce Video2StyleGAN that
takes a target image and driving video(s) to reenact the local and global
locations and expressions from the driving video in the identity of the target
image. We evaluate the effectiveness of our method over multiple challenging
scenarios and demonstrate clear improvements over alternative approaches.
Related papers
- Replace Anyone in Videos [39.4019337319795]
We propose the ReplaceAnyone framework, which focuses on localizing and manipulating human motion in videos.
Specifically, we formulate this task as an image-conditioned pose-driven video inpainting paradigm.
We introduce diverse mask forms involving regular and irregular shapes to avoid shape leakage and allow granular local control.
arXiv Detail & Related papers (2024-09-30T03:27:33Z) - Controllable Talking Face Generation by Implicit Facial Keypoints Editing [6.036277153327655]
We present ControlTalk, a talking face generation method to control face expression deformation based on driven audio.
Our experiments show that our method is superior to state-of-the-art performance on widely used benchmarks, including HDTF and MEAD.
arXiv Detail & Related papers (2024-06-05T02:54:46Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - Grasping the Arrow of Time from the Singularity: Decoding Micromotion in
Low-dimensional Latent Spaces from StyleGAN [105.99762358450633]
We show that "micromotion" can be represented in low-rank spaces extracted from latent space of StyleGAN-v2 model for face generation.
It can be represented as simple as an affine transformation over its latent feature.
It demonstrates that the local feature geometry corresponding to one type of micromotion is aligned across different face subjects.
arXiv Detail & Related papers (2022-04-27T04:38:39Z) - FEAT: Face Editing with Attention [70.89233432407305]
We build on the StyleGAN generator and present a method that explicitly encourages face manipulation to focus on the intended regions.
During the generation of the edited image, the attention map serves as a mask that guides a blending between the original features and the modified ones.
arXiv Detail & Related papers (2022-02-06T06:07:34Z) - Perceptually Validated Precise Local Editing for Facial Action Units
with StyleGAN [3.8149289266694466]
We build a solution based on StyleGAN, which has been used extensively for semantic manipulation of faces.
We show that a naive strategy to perform editing in the latent space results in undesired coupling between certain action units.
We validate the effectiveness of our local editing method through perception experiments conducted with 23 subjects.
arXiv Detail & Related papers (2021-07-26T12:21:37Z) - PIE: Portrait Image Embedding for Semantic Control [82.69061225574774]
We present the first approach for embedding real portrait images in the latent space of StyleGAN.
We use StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN.
An identity energy preservation term allows spatially coherent edits while maintaining facial integrity.
arXiv Detail & Related papers (2020-09-20T17:53:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.