CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking
Embedding
- URL: http://arxiv.org/abs/2311.08673v1
- Date: Wed, 15 Nov 2023 03:37:41 GMT
- Title: CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking
Embedding
- Authors: Jianzong Wang, Yimin Deng, Ziqi Liang, Xulong Zhang, Ning Cheng, Jing
Xiao
- Abstract summary: This paper proposes a talking face generation method named "CP-EB"
It takes an audio signal as input and a person image as reference, to synthesize a photo-realistic people talking video with head poses controlled by a short video clip and proper eye blinking.
Experimental results show that the proposed method can generate photo-realistic talking face with synchronous lips motions, natural head poses and blinking eyes.
- Score: 32.006763134518245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a talking face generation method named "CP-EB" that takes
an audio signal as input and a person image as reference, to synthesize a
photo-realistic people talking video with head poses controlled by a short
video clip and proper eye blinking embedding. It's noted that not only the head
pose but also eye blinking are both important aspects for deep fake detection.
The implicit control of poses by video has already achieved by the state-of-art
work. According to recent research, eye blinking has weak correlation with
input audio which means eye blinks extraction from audio and generation are
possible. Hence, we propose a GAN-based architecture to extract eye blink
feature from input audio and reference video respectively and employ
contrastive training between them, then embed it into the concatenated features
of identity and poses to generate talking face images. Experimental results
show that the proposed method can generate photo-realistic talking face with
synchronous lips motions, natural head poses and blinking eyes.
Related papers
- JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation [24.2065254076207]
We introduce a novel method for joint expression and audio-guided talking face generation.
Our method can synthesize high-fidelity talking face videos, achieving state-of-the-art facial expression transfer.
arXiv Detail & Related papers (2024-09-18T17:18:13Z) - PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation [17.158581488104186]
Previous audio-driven talking head generation (THG) methods generate head poses from driving audio.
We propose textbfPoseTalk, a THG system that can freely generate lip-synchronized talking head videos with free head poses conditioned on text prompts and audio.
arXiv Detail & Related papers (2024-09-04T12:30:25Z) - Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a
Short Video [91.92782707888618]
We present a decomposition-composition framework named Speech to Lip (Speech2Lip) that disentangles speech-sensitive and speech-insensitive motion/appearance.
We show that our model can be trained by a video of just a few minutes in length and achieve state-of-the-art performance in both visual quality and speech-visual synchronization.
arXiv Detail & Related papers (2023-09-09T14:52:39Z) - Identity-Preserving Talking Face Generation with Landmark and Appearance
Priors [106.79923577700345]
Existing person-generic methods have difficulty in generating realistic and lip-synced videos.
We propose a two-stage framework consisting of audio-to-landmark generation and landmark-to-video rendering procedures.
Our method can produce more realistic, lip-synced, and identity-preserving videos than existing person-generic talking face generation methods.
arXiv Detail & Related papers (2023-05-15T01:31:32Z) - DFA-NeRF: Personalized Talking Head Generation via Disentangled Face
Attributes Neural Rendering [69.9557427451339]
We propose a framework based on neural radiance field to pursue high-fidelity talking head generation.
Specifically, neural radiance field takes lip movements features and personalized attributes as two disentangled conditions.
We show that our method achieves significantly better results than state-of-the-art methods.
arXiv Detail & Related papers (2022-01-03T18:23:38Z) - FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute
Learning [23.14865405847467]
We propose a talking face generation method that takes an audio signal as input and a short target video clip as reference.
The method synthesizes a photo-realistic video of the target face with natural lip motions, head poses, and eye blinks that are in-sync with the input audio signal.
Experimental results and user studies show our method can generate realistic talking face videos with better qualities than the results of state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T02:10:26Z) - Pose-Controllable Talking Face Generation by Implicitly Modularized
Audio-Visual Representation [96.66010515343106]
We propose a clean yet effective framework to generate pose-controllable talking faces.
We operate on raw face images, using only a single photo as an identity reference.
Our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.
arXiv Detail & Related papers (2021-04-22T15:10:26Z) - Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking.
Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z) - Audio-driven Talking Face Video Generation with Learning-based
Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input.
We outputs a synthesized high-quality talking face video with personalized head pose.
Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.