Video2StyleGAN: Encoding Video in Latent Space for Manipulation
- URL: http://arxiv.org/abs/2206.13078v1
- Date: Mon, 27 Jun 2022 06:48:15 GMT
- Title: Video2StyleGAN: Encoding Video in Latent Space for Manipulation
- Authors: Jiyang Yu, Jingen Liu, Jing Huang, Wei Zhang, Tao Mei
- Abstract summary: We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
- Score: 63.03250800510085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many recent works have been proposed for face image editing by leveraging the
latent space of pretrained GANs. However, few attempts have been made to
directly apply them to videos, because 1) they do not guarantee temporal
consistency, 2) their application is limited by their processing speed on
videos, and 3) they cannot accurately encode details of face motion and
expression. To this end, we propose a novel network to encode face videos into
the latent space of StyleGAN for semantic face video manipulation. Based on the
vision transformer, our network reuses the high-resolution portion of the
latent vector to enforce temporal consistency. To capture subtle face motions
and expressions, we design novel losses that involve sparse facial landmarks
and dense 3D face mesh. We have thoroughly evaluated our approach and
successfully demonstrated its application to various face video manipulations.
Particularly, we propose a novel network for pose/expression control in a 3D
coordinate system. Both qualitative and quantitative results have shown that
our approach can significantly outperform existing single image methods, while
achieving real-time (66 fps) speed.
Related papers
- Revealing Directions for Text-guided 3D Face Editing [52.85632020601518]
3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals.
We present Face Clan, a text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions.
Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description.
arXiv Detail & Related papers (2024-10-07T12:04:39Z) - G3FA: Geometry-guided GAN for Face Animation [14.488117084637631]
We introduce Geometry-guided GAN for Face Animation (G3FA) to tackle this limitation.
Our novel approach empowers the face animation model to incorporate 3D information using only 2D images.
In our face reenactment model, we leverage 2D motion warping to capture motion dynamics.
arXiv Detail & Related papers (2024-08-23T13:13:24Z) - Copy Motion From One to Another: Fake Motion Video Generation [53.676020148034034]
A compelling application of artificial intelligence is to generate a video of a target person performing arbitrary desired motion.
Current methods typically employ GANs with a L2 loss to assess the authenticity of the generated videos.
We propose a theoretically motivated Gromov-Wasserstein loss that facilitates learning the mapping from a pose to a foreground image.
Our method is able to generate realistic target person videos, faithfully copying complex motions from a source person.
arXiv Detail & Related papers (2022-05-03T08:45:22Z) - Stitch it in Time: GAN-Based Facial Editing of Real Videos [38.81306268180105]
We propose a framework for semantic editing of faces in videos, demonstrating significant improvements over the current state-of-the-art.
Our method produces meaningful face manipulations, maintains a higher degree of temporal consistency, and can be applied to challenging, high quality, talking head videos.
arXiv Detail & Related papers (2022-01-20T18:48:20Z) - UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video
Editing [78.26925404508994]
We propose a unified temporally consistent facial video editing framework termed UniFaceGAN.
Our framework is designed to handle face swapping and face reenactment simultaneously.
Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
arXiv Detail & Related papers (2021-08-12T10:35:22Z) - Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z) - Head2Head++: Deep Facial Attributes Re-Targeting [6.230979482947681]
We leverage the 3D geometry of faces and Generative Adversarial Networks (GANs) to design a novel deep learning architecture for the task of facial and head reenactment.
We manage to capture the complex non-rigid facial motion from the driving monocular performances and synthesise temporally consistent videos.
Our system performs end-to-end reenactment in nearly real-time speed (18 fps)
arXiv Detail & Related papers (2020-06-17T23:38:37Z) - DeepFaceFlow: In-the-wild Dense 3D Facial Motion Estimation [56.56575063461169]
DeepFaceFlow is a robust, fast, and highly-accurate framework for the estimation of 3D non-rigid facial flow.
Our framework was trained and tested on two very large-scale facial video datasets.
Given registered pairs of images, our framework generates 3D flow maps at 60 fps.
arXiv Detail & Related papers (2020-05-14T23:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.