StyleFaceV: Face Video Generation via Decomposing and Recomposing
Pretrained StyleGAN3
- URL: http://arxiv.org/abs/2208.07862v1
- Date: Tue, 16 Aug 2022 17:47:03 GMT
- Title: StyleFaceV: Face Video Generation via Decomposing and Recomposing
Pretrained StyleGAN3
- Authors: Haonan Qiu, Yuming Jiang, Hang Zhou, Wayne Wu, Ziwei Liu
- Abstract summary: We propose a principled framework named StyleFaceV, which produces high-fidelity identity-preserving face videos with vivid movements.
Our core insight is to decompose appearance and pose information and recompose them in the latent space of StyleGAN3 to produce stable and dynamic results.
- Score: 43.43545400625567
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Realistic generative face video synthesis has long been a pursuit in both
computer vision and graphics community. However, existing face video generation
methods tend to produce low-quality frames with drifted facial identities and
unnatural movements. To tackle these challenges, we propose a principled
framework named StyleFaceV, which produces high-fidelity identity-preserving
face videos with vivid movements. Our core insight is to decompose appearance
and pose information and recompose them in the latent space of StyleGAN3 to
produce stable and dynamic results. Specifically, StyleGAN3 provides strong
priors for high-fidelity facial image generation, but the latent space is
intrinsically entangled. By carefully examining its latent properties, we
propose our decomposition and recomposition designs which allow for the
disentangled combination of facial appearance and movements. Moreover, a
temporal-dependent model is built upon the decomposed latent features, and
samples reasonable sequences of motions that are capable of generating
realistic and temporally coherent face videos. Particularly, our pipeline is
trained with a joint training strategy on both static images and high-quality
video data, which is of higher data efficiency. Extensive experiments
demonstrate that our framework achieves state-of-the-art face video generation
results both qualitatively and quantitatively. Notably, StyleFaceV is capable
of generating realistic $1024\times1024$ face videos even without
high-resolution training videos.
Related papers
- SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model [66.34929233269409]
Talking Head Generation (THG) is an important task with broad application prospects in various fields such as digital humans, film production, and virtual reality.
We propose a novel framework named Style-Enhanced Vivid Portrait (SVP) which fully leverages style-related information in THG.
Our model generates diverse, vivid, and high-quality videos with flexible control over intrinsic styles, outperforming existing state-of-the-art methods.
arXiv Detail & Related papers (2024-09-05T06:27:32Z) - G3FA: Geometry-guided GAN for Face Animation [14.488117084637631]
We introduce Geometry-guided GAN for Face Animation (G3FA) to tackle this limitation.
Our novel approach empowers the face animation model to incorporate 3D information using only 2D images.
In our face reenactment model, we leverage 2D motion warping to capture motion dynamics.
arXiv Detail & Related papers (2024-08-23T13:13:24Z) - VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation [79.99551055245071]
We propose VividPose, an end-to-end pipeline that ensures superior temporal stability.
An identity-aware appearance controller integrates additional facial information without compromising other appearance details.
A geometry-aware pose controller utilizes both dense rendering maps from SMPL-X and sparse skeleton maps.
VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-05-28T13:18:32Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - Video-driven Neural Physically-based Facial Asset for Production [33.24654834163312]
We present a new learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets.
Our technique provides higher accuracy and visual fidelity than previous video-driven facial reconstruction and animation methods.
arXiv Detail & Related papers (2022-02-11T13:22:48Z) - Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z) - Head2Head++: Deep Facial Attributes Re-Targeting [6.230979482947681]
We leverage the 3D geometry of faces and Generative Adversarial Networks (GANs) to design a novel deep learning architecture for the task of facial and head reenactment.
We manage to capture the complex non-rigid facial motion from the driving monocular performances and synthesise temporally consistent videos.
Our system performs end-to-end reenactment in nearly real-time speed (18 fps)
arXiv Detail & Related papers (2020-06-17T23:38:37Z) - DeepFaceFlow: In-the-wild Dense 3D Facial Motion Estimation [56.56575063461169]
DeepFaceFlow is a robust, fast, and highly-accurate framework for the estimation of 3D non-rigid facial flow.
Our framework was trained and tested on two very large-scale facial video datasets.
Given registered pairs of images, our framework generates 3D flow maps at 60 fps.
arXiv Detail & Related papers (2020-05-14T23:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.