Audio-Driven Emotional Video Portraits
- URL: http://arxiv.org/abs/2104.07452v1
- Date: Thu, 15 Apr 2021 13:37:13 GMT
- Title: Audio-Driven Emotional Video Portraits
- Authors: Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun
Cao, Feng Xu
- Abstract summary: We present Emotional Video Portraits (EVP), a system for synthesizing high-quality video portraits with vivid emotional dynamics driven by audios.
Specifically, we propose the Cross-Reconstructed Emotion Disentanglement technique to decompose speech into two decoupled spaces.
With the disentangled features, dynamic 2D emotional facial landmarks can be deduced.
Then we propose the Target-Adaptive Face Synthesis technique to generate the final high-quality video portraits.
- Score: 79.95687903497354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite previous success in generating audio-driven talking heads, most of
the previous studies focus on the correlation between speech content and the
mouth shape. Facial emotion, which is one of the most important features on
natural human faces, is always neglected in their methods. In this work, we
present Emotional Video Portraits (EVP), a system for synthesizing high-quality
video portraits with vivid emotional dynamics driven by audios. Specifically,
we propose the Cross-Reconstructed Emotion Disentanglement technique to
decompose speech into two decoupled spaces, i.e., a duration-independent
emotion space and a duration dependent content space. With the disentangled
features, dynamic 2D emotional facial landmarks can be deduced. Then we propose
the Target-Adaptive Face Synthesis technique to generate the final high-quality
video portraits, by bridging the gap between the deduced landmarks and the
natural head poses of target videos. Extensive experiments demonstrate the
effectiveness of our method both qualitatively and quantitatively.
Related papers
- Audio-Driven Emotional 3D Talking-Head Generation [47.6666060652434]
We present a novel system for synthesizing high-fidelity, audio-driven video portraits with accurate emotional expressions.
We propose a pose sampling method that generates natural idle-state (non-speaking) videos in response to silent audio inputs.
arXiv Detail & Related papers (2024-10-07T08:23:05Z) - EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion [5.954758598327494]
EMOdiffhead is a novel method for emotional talking head video generation.
It enables fine-grained control of emotion categories and intensities.
It achieves state-of-the-art performance compared to other emotion portrait animation methods.
arXiv Detail & Related papers (2024-09-11T13:23:22Z) - Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation [12.044308738509402]
We propose a two-stage audio-driven talking face generation framework that employs 3D facial landmarks as intermediate variables.
This framework achieves collaborative alignment of expression, gaze, and pose with emotions through self-supervised learning.
Our model significantly advances the state-of-the-art performance in both visual quality and emotional alignment.
arXiv Detail & Related papers (2024-06-12T06:00:00Z) - EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions [18.364859748601887]
We propose EMO, a novel framework that utilizes a direct audio-to-video synthesis approach.
Our method ensures seamless frame transitions and consistent identity preservation throughout the video, resulting in highly expressive and lifelike animations.
arXiv Detail & Related papers (2024-02-27T13:10:11Z) - DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for
Single Image Talking Face Generation [75.90730434449874]
We introduce DREAM-Talk, a two-stage diffusion-based audio-driven framework, tailored for generating diverse expressions and accurate lip-sync concurrently.
Given the strong correlation between lip motion and audio, we then refine the dynamics with enhanced lip-sync accuracy using audio features and emotion style.
Both quantitatively and qualitatively, DREAM-Talk outperforms state-of-the-art methods in terms of expressiveness, lip-sync accuracy and perceptual quality.
arXiv Detail & Related papers (2023-12-21T05:03:18Z) - Emotional Speech-Driven Animation with Content-Emotion Disentanglement [51.34635009347183]
We propose EMOTE, which generates 3D talking-head avatars that maintain lip-sync from speech while enabling explicit control over the expression of emotion.
EmOTE produces speech-driven facial animations with better lip-sync than state-of-the-art methods trained on the same data.
arXiv Detail & Related papers (2023-06-15T09:31:31Z) - Learning to Dub Movies via Hierarchical Prosody Models [167.6465354313349]
Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference.
We propose a novel movie dubbing architecture to tackle these problems via hierarchical prosody modelling, which bridges the visual information to corresponding speech prosody from three aspects: lip, face, and scene.
arXiv Detail & Related papers (2022-12-08T03:29:04Z) - SPACEx: Speech-driven Portrait Animation with Controllable Expression [31.99644011371433]
We present SPACEx, which uses speech and a single image to generate expressive videos with realistic head pose.
It uses a multi-stage approach, combining the controllability of facial landmarks with the high-quality synthesis power of a pretrained face generator.
arXiv Detail & Related papers (2022-11-17T18:59:56Z) - MakeItTalk: Speaker-Aware Talking-Head Animation [49.77977246535329]
We present a method that generates expressive talking heads from a single facial image with audio as the only input.
Based on this intermediate representation, our method is able to synthesize photorealistic videos of entire talking heads with full range of motion.
arXiv Detail & Related papers (2020-04-27T17:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.