Related papers: Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks

Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks

URL: http://arxiv.org/abs/2306.03594v1
Date: Tue, 6 Jun 2023 11:31:29 GMT
Title: Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks
Authors: Jianrong Wang, Yaxin Zhao, Li Liu, Tianyi Xu, Qi Li, Sen Li
Abstract summary: We propose a talking head generation model consisting of a Memory-Sharing Emotion Feature extractor and an Attention-Augmented Translator based on U-net. MSEF can extract implicit emotional auxiliary features from audio to estimate more accurate emotional face landmarks. AATU acts as a translator between the estimated landmarks and the photo-realistic video frames.
Score: 21.864200803678003
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Given an audio clip and a reference face image, the goal of the talking head generation is to generate a high-fidelity talking head video. Although some audio-driven methods of generating talking head videos have made some achievements in the past, most of them only focused on lip and audio synchronization and lack the ability to reproduce the facial expressions of the target person. To this end, we propose a talking head generation model consisting of a Memory-Sharing Emotion Feature extractor (MSEF) and an Attention-Augmented Translator based on U-net (AATU). Firstly, MSEF can extract implicit emotional auxiliary features from audio to estimate more accurate emotional face landmarks.~Secondly, AATU acts as a translator between the estimated landmarks and the photo-realistic video frames. Extensive qualitative and quantitative experiments have shown the superiority of the proposed method to the previous works. Codes will be made publicly available.

Related papers

EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion [49.55774551366049]
Diffusion models have revolutionized the field of talking head generation, yet still face challenges in expressiveness, controllability, and stability in long-time generation. We propose an EmotiveTalk framework to address these issues. Experimental results show that EmotiveTalk can generate expressive talking head videos, ensuring the promised controllability of emotions and stability during long-time generation.
arXiv Detail & Related papers (2024-11-23T04:38:51Z)
Audio-Driven Emotional 3D Talking-Head Generation [47.6666060652434]
We present a novel system for synthesizing high-fidelity, audio-driven video portraits with accurate emotional expressions. We propose a pose sampling method that generates natural idle-state (non-speaking) videos in response to silent audio inputs.
arXiv Detail & Related papers (2024-10-07T08:23:05Z)
JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation [24.2065254076207]
We introduce a novel method for joint expression and audio-guided talking face generation. Our method can synthesize high-fidelity talking face videos, achieving state-of-the-art facial expression transfer.
arXiv Detail & Related papers (2024-09-18T17:18:13Z)
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation [75.90730434449874]
We introduce DREAM-Talk, a two-stage diffusion-based audio-driven framework, tailored for generating diverse expressions and accurate lip-sync concurrently. Given the strong correlation between lip motion and audio, we then refine the dynamics with enhanced lip-sync accuracy using audio features and emotion style. Both quantitatively and qualitatively, DREAM-Talk outperforms state-of-the-art methods in terms of expressiveness, lip-sync accuracy and perceptual quality.
arXiv Detail & Related papers (2023-12-21T05:03:18Z)
MFR-Net: Multi-faceted Responsive Listening Head Generation via Denoising Diffusion Model [14.220727407255966]
Responsive listening head generation is an important task that aims to model face-to-face communication scenarios. We propose the textbfMulti-textbfFaceted textbfResponsive Listening Head Generation Network (MFR-Net)
arXiv Detail & Related papers (2023-08-31T11:10:28Z)
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors [106.79923577700345]
Existing person-generic methods have difficulty in generating realistic and lip-synced videos. We propose a two-stage framework consisting of audio-to-landmark generation and landmark-to-video rendering procedures. Our method can produce more realistic, lip-synced, and identity-preserving videos than existing person-generic talking face generation methods.
arXiv Detail & Related papers (2023-05-15T01:31:32Z)
Responsive Listening Head Generation: A Benchmark Dataset and Baseline [58.168958284290156]
We define the responsive listening head generation task as the synthesis of a non-verbal head with motions and expressions reacting to the multiple inputs. Unlike speech-driven gesture or talking head generation, we introduce more modals in this task, hoping to benefit several research fields.
arXiv Detail & Related papers (2021-12-27T07:18:50Z)
Speech2Video: Cross-Modal Distillation for Speech to Video Generation [21.757776580641902]
Speech-to-video generation technique can spark interesting applications in entertainment, customer service, and human-computer-interaction industries. The challenge mainly lies in disentangling the distinct visual attributes from audio signals. We propose a light-weight, cross-modal distillation method to extract disentangled emotional and identity information from unlabelled video inputs.
arXiv Detail & Related papers (2021-07-10T10:27:26Z)
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input. We outputs a synthesized high-quality talking face video with personalized head pose. Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.