Neural Emotion Director: Speech-preserving semantic control of facial
expressions in "in-the-wild" videos
- URL: http://arxiv.org/abs/2112.00585v1
- Date: Wed, 1 Dec 2021 15:55:04 GMT
- Title: Neural Emotion Director: Speech-preserving semantic control of facial
expressions in "in-the-wild" videos
- Authors: Foivos Paraperas Papantoniou, Panagiotis P. Filntisis, Petros Maragos,
Anastasios Roussos
- Abstract summary: We introduce a novel deep learning method for photo-realistic manipulation of the emotional state of actors in "in-the-wild" videos.
The proposed method is based on a parametric 3D face representation of the actor in the input scene that offers a reliable disentanglement of the facial identity from the head pose and facial expressions.
It then uses a novel deep domain translation framework that alters the facial expressions in a consistent and plausible manner, taking into account their dynamics.
- Score: 31.746152261362777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce a novel deep learning method for photo-realistic
manipulation of the emotional state of actors in "in-the-wild" videos. The
proposed method is based on a parametric 3D face representation of the actor in
the input scene that offers a reliable disentanglement of the facial identity
from the head pose and facial expressions. It then uses a novel deep domain
translation framework that alters the facial expressions in a consistent and
plausible manner, taking into account their dynamics. Finally, the altered
facial expressions are used to photo-realistically manipulate the facial region
in the input scene based on an especially-designed neural face renderer. To the
best of our knowledge, our method is the first to be capable of controlling the
actor's facial expressions by even using as a sole input the semantic labels of
the manipulated emotions, while at the same time preserving the speech-related
lip movements. We conduct extensive qualitative and quantitative evaluations
and comparisons, which demonstrate the effectiveness of our approach and the
especially promising results that we obtain. Our method opens a plethora of new
possibilities for useful applications of neural rendering technologies, ranging
from movie post-production and video games to photo-realistic affective
avatars.
Related papers
- CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation [13.27632316528572]
Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations.
Main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions.
This paper proposes a method called CSTalk that models the correlations among different regions of facial movements and supervises the training of the generative model to generate realistic expressions.
arXiv Detail & Related papers (2024-04-29T11:19:15Z) - Audio-Driven Talking Face Generation with Diverse yet Realistic Facial
Animations [61.65012981435094]
DIRFA is a novel method that can generate talking faces with diverse yet realistic facial animations from the same driving audio.
To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network.
We show that DIRFA can generate talking faces with realistic facial animations effectively.
arXiv Detail & Related papers (2023-04-18T12:36:15Z) - Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor.
We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video.
We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z) - PERI: Part Aware Emotion Recognition In The Wild [4.206175795966693]
This paper focuses on emotion recognition using visual features.
We create part aware spatial (PAS) images by extracting key regions from the input image using a mask generated from both body pose and facial landmarks.
We provide our results on the publicly available in the wild EMOTIC dataset.
arXiv Detail & Related papers (2022-10-18T20:01:40Z) - Continuously Controllable Facial Expression Editing in Talking Face
Videos [34.83353695337335]
Speech-related expressions and emotion-related expressions are often highly coupled.
Traditional image-to-image translation methods cannot work well in our application.
We propose a high-quality facial expression editing method for talking face videos.
arXiv Detail & Related papers (2022-09-17T09:05:47Z) - Emotion Separation and Recognition from a Facial Expression by
Generating the Poker Face with Vision Transformers [57.67586172996843]
We propose a novel FER model, called Poker Face Vision Transformer or PF-ViT, to separate and recognize the disturbance-agnostic emotion from a static facial image.
PF-ViT generates its corresponding poker face without the need for paired images.
arXiv Detail & Related papers (2022-07-22T13:39:06Z) - Watch Those Words: Video Falsification Detection Using Word-Conditioned
Facial Motion [82.06128362686445]
We propose a multi-modal semantic forensic approach to handle both cheapfakes and visually persuasive deepfakes.
We leverage the idea of attribution to learn person-specific biometric patterns that distinguish a given speaker from others.
Unlike existing person-specific approaches, our method is also effective against attacks that focus on lip manipulation.
arXiv Detail & Related papers (2021-12-21T01:57:04Z) - Deep Semantic Manipulation of Facial Videos [5.048861360606916]
This paper proposes the first method to perform photorealistic manipulation of facial expressions in videos.
Our method supports semantic video manipulation based on neural rendering and 3D-based facial expression modelling.
The proposed method is based on a disentangled representation and estimation of the 3D facial shape and activity.
arXiv Detail & Related papers (2021-11-15T16:55:16Z) - Facial Expression Editing with Continuous Emotion Labels [76.36392210528105]
Deep generative models have achieved impressive results in the field of automated facial expression editing.
We propose a model that can be used to manipulate facial expressions in facial images according to continuous two-dimensional emotion labels.
arXiv Detail & Related papers (2020-06-22T13:03:02Z) - Real-time Facial Expression Recognition "In The Wild'' by Disentangling
3D Expression from Identity [6.974241731162878]
This paper proposes a novel method for human emotion recognition from a single RGB image.
We construct a large-scale dataset of facial videos, rich in facial dynamics, identities, expressions, appearance and 3D pose variations.
Our proposed framework runs at 50 frames per second and is capable of robustly estimating parameters of 3D expression variation.
arXiv Detail & Related papers (2020-05-12T01:32:55Z) - MakeItTalk: Speaker-Aware Talking-Head Animation [49.77977246535329]
We present a method that generates expressive talking heads from a single facial image with audio as the only input.
Based on this intermediate representation, our method is able to synthesize photorealistic videos of entire talking heads with full range of motion.
arXiv Detail & Related papers (2020-04-27T17:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.