Image-to-Video Generation via 3D Facial Dynamics
- URL: http://arxiv.org/abs/2105.14678v1
- Date: Mon, 31 May 2021 02:30:11 GMT
- Title: Image-to-Video Generation via 3D Facial Dynamics
- Authors: Xiaoguang Tu, Yingtian Zou, Jian Zhao, Wenjie Ai, Jian Dong, Yuan Yao,
Zhikang Wang, Guodong Guo, Zhifeng Li, Wei Liu, and Jiashi Feng
- Abstract summary: We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
- Score: 78.01476554323179
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a versatile model, FaceAnime, for various video generation tasks
from still images. Video generation from a single face image is an interesting
problem and usually tackled by utilizing Generative Adversarial Networks (GANs)
to integrate information from the input face image and a sequence of sparse
facial landmarks. However, the generated face images usually suffer from
quality loss, image distortion, identity change, and expression mismatching due
to the weak representation capacity of the facial landmarks. In this paper, we
propose to "imagine" a face video from a single face image according to the
reconstructed 3D face dynamics, aiming to generate a realistic and
identity-preserving face video, with precisely predicted pose and facial
expression. The 3D dynamics reveal changes of the facial expression and motion,
and can serve as a strong prior knowledge for guiding highly realistic face
video generation. In particular, we explore face video prediction and exploit a
well-designed 3D dynamic prediction network to predict a 3D dynamic sequence
for a single face image. The 3D dynamics are then further rendered by the
sparse texture mapping algorithm to recover structural details and sparse
textures for generating face frames. Our model is versatile for various AR/VR
and entertainment applications, such as face video retargeting and face video
prediction. Superior experimental results have well demonstrated its
effectiveness in generating high-fidelity, identity-preserving, and visually
pleasant face video clips from a single source face image.
Related papers
- FaceGPT: Self-supervised Learning to Chat about 3D Human Faces [69.4651241319356]
We introduce FaceGPT, a self-supervised learning framework for Large Vision-Language Models (VLMs) to reason about 3D human faces from images and text.
FaceGPT overcomes this limitation by embedding the parameters of a 3D morphable face model (3DMM) into the token space of a VLM.
We show that FaceGPT achieves high-quality 3D face reconstructions and retains the ability for general-purpose visual instruction following.
arXiv Detail & Related papers (2024-06-11T11:13:29Z) - NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images [18.489290898059462]
This paper presents a novel 3D face rendering model, namely NeuFace, to learn accurate and physically-meaningful underlying 3D representations.
We introduce an approximated BRDF integration and a simple yet new low-rank prior, which effectively lower the ambiguities and boost the performance of the facial BRDFs.
arXiv Detail & Related papers (2023-03-24T15:57:39Z) - StyleFaceV: Face Video Generation via Decomposing and Recomposing
Pretrained StyleGAN3 [43.43545400625567]
We propose a principled framework named StyleFaceV, which produces high-fidelity identity-preserving face videos with vivid movements.
Our core insight is to decompose appearance and pose information and recompose them in the latent space of StyleGAN3 to produce stable and dynamic results.
arXiv Detail & Related papers (2022-08-16T17:47:03Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - Video-driven Neural Physically-based Facial Asset for Production [33.24654834163312]
We present a new learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets.
Our technique provides higher accuracy and visual fidelity than previous video-driven facial reconstruction and animation methods.
arXiv Detail & Related papers (2022-02-11T13:22:48Z) - SAFA: Structure Aware Face Animation [9.58882272014749]
We propose a structure aware face animation (SAFA) method which constructs specific geometric structures to model different components of a face image.
We use a 3D morphable model (3DMM) to model the face, multiple affine transforms to model the other foreground components like hair and beard, and an identity transform to model the background.
The 3DMM geometric embedding not only helps generate realistic structure for the driving scene, but also contributes to better perception of occluded area in the generated image.
arXiv Detail & Related papers (2021-11-09T03:22:38Z) - Inverting Generative Adversarial Renderer for Face Reconstruction [58.45125455811038]
In this work, we introduce a novel Generative Adversa Renderer (GAR)
GAR learns to model the complicated real-world image, instead of relying on the graphics rules, it is capable of producing realistic images.
Our method achieves state-of-the-art performances on multiple face reconstruction.
arXiv Detail & Related papers (2021-05-06T04:16:06Z) - FaceDet3D: Facial Expressions with 3D Geometric Detail Prediction [62.5557724039217]
Facial expressions induce a variety of high-level details on the 3D face geometry.
Morphable Models (3DMMs) of the human face fail to capture such fine details in their PCA-based representations.
We introduce FaceDet3D, a first-of-its-kind method that generates - from a single image - geometric facial details consistent with any desired target expression.
arXiv Detail & Related papers (2020-12-14T23:07:38Z) - Head2Head++: Deep Facial Attributes Re-Targeting [6.230979482947681]
We leverage the 3D geometry of faces and Generative Adversarial Networks (GANs) to design a novel deep learning architecture for the task of facial and head reenactment.
We manage to capture the complex non-rigid facial motion from the driving monocular performances and synthesise temporally consistent videos.
Our system performs end-to-end reenactment in nearly real-time speed (18 fps)
arXiv Detail & Related papers (2020-06-17T23:38:37Z) - DeepFaceFlow: In-the-wild Dense 3D Facial Motion Estimation [56.56575063461169]
DeepFaceFlow is a robust, fast, and highly-accurate framework for the estimation of 3D non-rigid facial flow.
Our framework was trained and tested on two very large-scale facial video datasets.
Given registered pairs of images, our framework generates 3D flow maps at 60 fps.
arXiv Detail & Related papers (2020-05-14T23:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.