Audio-driven Talking Face Video Generation with Learning-based
Personalized Head Pose
- URL: http://arxiv.org/abs/2002.10137v2
- Date: Thu, 5 Mar 2020 10:06:22 GMT
- Title: Audio-driven Talking Face Video Generation with Learning-based
Personalized Head Pose
- Authors: Ran Yi, Zipeng Ye, Juyong Zhang, Hujun Bao, Yong-Jin Liu
- Abstract summary: We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input.
We outputs a synthesized high-quality talking face video with personalized head pose.
Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
- Score: 67.31838207805573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world talking faces often accompany with natural head movement. However,
most existing talking face video generation methods only consider facial
animation with fixed head pose. In this paper, we address this problem by
proposing a deep neural network model that takes an audio signal A of a source
person and a very short video V of a target person as input, and outputs a
synthesized high-quality talking face video with personalized head pose (making
use of the visual information in V), expression and lip synchronization (by
considering both A and V). The most challenging issue in our work is that
natural poses often cause in-plane and out-of-plane head rotations, which makes
synthesized talking face video far from realistic. To address this challenge,
we reconstruct 3D face animation and re-render it into synthesized frames. To
fine tune these frames into realistic ones with smooth background transition,
we propose a novel memory-augmented GAN module. By first training a general
mapping based on a publicly available dataset and fine-tuning the mapping using
the input short video of target person, we develop an effective strategy that
only requires a small number of frames (about 300 frames) to learn personalized
talking behavior including head pose. Extensive experiments and two user
studies show that our method can generate high-quality (i.e., personalized head
movements, expressions and good lip synchronization) talking face videos, which
are naturally looking with more distinguishing head movement effects than the
state-of-the-art methods.
Related papers
- Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis [88.17520303867099]
One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio.
We present Real3D-Potrait, a framework that improves the one-shot 3D reconstruction power with a large image-to-plane model.
Experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos.
arXiv Detail & Related papers (2024-01-16T17:04:30Z) - Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a
Short Video [91.92782707888618]
We present a decomposition-composition framework named Speech to Lip (Speech2Lip) that disentangles speech-sensitive and speech-insensitive motion/appearance.
We show that our model can be trained by a video of just a few minutes in length and achieve state-of-the-art performance in both visual quality and speech-visual synchronization.
arXiv Detail & Related papers (2023-09-09T14:52:39Z) - FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute
Learning [23.14865405847467]
We propose a talking face generation method that takes an audio signal as input and a short target video clip as reference.
The method synthesizes a photo-realistic video of the target face with natural lip motions, head poses, and eye blinks that are in-sync with the input audio signal.
Experimental results and user studies show our method can generate realistic talking face videos with better qualities than the results of state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T02:10:26Z) - Audio2Head: Audio-driven One-shot Talking-head Generation with Natural
Head Motion [34.406907667904996]
We propose an audio-driven talking-head method to generate photo-realistic talking-head videos from a single reference image.
We first design a head pose predictor by modeling rigid 6D head movements with a motion-aware recurrent neural network (RNN)
Then, we develop a motion field generator to produce the dense motion fields from input audio, head poses, and a reference image.
arXiv Detail & Related papers (2021-07-20T07:22:42Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z) - Head2HeadFS: Video-based Head Reenactment with Few-shot Learning [64.46913473391274]
Head reenactment is a challenging task, which aims at transferring the entire head pose from a source person to a target.
We propose head2headFS, a novel easily adaptable pipeline for head reenactment.
Our video-based rendering network is fine-tuned under a few-shot learning strategy, using only a few samples.
arXiv Detail & Related papers (2021-03-30T10:19:41Z) - Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking.
Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z) - Talking-head Generation with Rhythmic Head Motion [46.6897675583319]
We propose a 3D-aware generative network with a hybrid embedding module and a non-linear composition module.
Our approach achieves controllable, photo-realistic, and temporally coherent talking-head videos with natural head movements.
arXiv Detail & Related papers (2020-07-16T18:13:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.