Deep Semantic Manipulation of Facial Videos
- URL: http://arxiv.org/abs/2111.07902v1
- Date: Mon, 15 Nov 2021 16:55:16 GMT
- Title: Deep Semantic Manipulation of Facial Videos
- Authors: Girish Kumar Solanki, Anastasios Roussos
- Abstract summary: This paper proposes the first method to perform photorealistic manipulation of facial expressions in videos.
Our method supports semantic video manipulation based on neural rendering and 3D-based facial expression modelling.
The proposed method is based on a disentangled representation and estimation of the 3D facial shape and activity.
- Score: 5.048861360606916
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Editing and manipulating facial features in videos is an interesting and
important field of research with a plethora of applications, ranging from movie
post-production and visual effects to realistic avatars for video games and
virtual assistants. To the best of our knowledge, this paper proposes the first
method to perform photorealistic manipulation of facial expressions in videos.
Our method supports semantic video manipulation based on neural rendering and
3D-based facial expression modelling. We focus on interactive manipulation of
the videos by altering and controlling the facial expressions, achieving
promising photorealistic results. The proposed method is based on a
disentangled representation and estimation of the 3D facial shape and activity,
providing the user with intuitive and easy-to-use control of the facial
expressions in the input video. We also introduce a user-friendly, interactive
AI tool that processes human-readable semantic labels about the desired emotion
manipulations in specific parts of the input video and synthesizes
photorealistic manipulated videos. We achieve that by mapping the emotion
labels to valence-arousal (VA) values, which in turn are mapped to disentangled
3D facial expressions through an especially designed and trained expression
decoder network. The paper presents detailed qualitative and quantitative
experiments, which demonstrate the effectiveness of our system and the
promising results it achieves. Additional results and videos can be found at
the supplementary material (https://github.com/Girish-03/DeepSemManipulation).
Related papers
- Revealing Directions for Text-guided 3D Face Editing [52.85632020601518]
3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals.
We present Face Clan, a text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions.
Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description.
arXiv Detail & Related papers (2024-10-07T12:04:39Z) - CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation [13.27632316528572]
Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations.
Main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions.
This paper proposes a method called CSTalk that models the correlations among different regions of facial movements and supervises the training of the generative model to generate realistic expressions.
arXiv Detail & Related papers (2024-04-29T11:19:15Z) - Music Recommendation Based on Facial Emotion Recognition [0.0]
This paper presents a comprehensive approach to enhancing the user experience through the integration of emotion recognition, music recommendation, and explainable AI using GRAD-CAM.
The proposed methodology utilizes a ResNet50 model trained on the Facial Expression Recognition dataset, consisting of real images of individuals expressing various emotions.
arXiv Detail & Related papers (2024-04-06T15:14:25Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - Neural Sign Reenactor: Deep Photorealistic Sign Language Retargeting [28.012212656892746]
We introduce a neural rendering pipeline for transferring the facial expressions, head pose, and body movements of one person in a source video to another in a target video.
Our method can be used for Sign Language Anonymization, Sign Language Production (synthesis module), as well as for reenacting other types of full body activities.
arXiv Detail & Related papers (2022-09-03T18:04:50Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - Copy Motion From One to Another: Fake Motion Video Generation [53.676020148034034]
A compelling application of artificial intelligence is to generate a video of a target person performing arbitrary desired motion.
Current methods typically employ GANs with a L2 loss to assess the authenticity of the generated videos.
We propose a theoretically motivated Gromov-Wasserstein loss that facilitates learning the mapping from a pose to a foreground image.
Our method is able to generate realistic target person videos, faithfully copying complex motions from a source person.
arXiv Detail & Related papers (2022-05-03T08:45:22Z) - Neural Emotion Director: Speech-preserving semantic control of facial
expressions in "in-the-wild" videos [31.746152261362777]
We introduce a novel deep learning method for photo-realistic manipulation of the emotional state of actors in "in-the-wild" videos.
The proposed method is based on a parametric 3D face representation of the actor in the input scene that offers a reliable disentanglement of the facial identity from the head pose and facial expressions.
It then uses a novel deep domain translation framework that alters the facial expressions in a consistent and plausible manner, taking into account their dynamics.
arXiv Detail & Related papers (2021-12-01T15:55:04Z) - Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z) - DeepFaceFlow: In-the-wild Dense 3D Facial Motion Estimation [56.56575063461169]
DeepFaceFlow is a robust, fast, and highly-accurate framework for the estimation of 3D non-rigid facial flow.
Our framework was trained and tested on two very large-scale facial video datasets.
Given registered pairs of images, our framework generates 3D flow maps at 60 fps.
arXiv Detail & Related papers (2020-05-14T23:56:48Z) - Real-time Facial Expression Recognition "In The Wild'' by Disentangling
3D Expression from Identity [6.974241731162878]
This paper proposes a novel method for human emotion recognition from a single RGB image.
We construct a large-scale dataset of facial videos, rich in facial dynamics, identities, expressions, appearance and 3D pose variations.
Our proposed framework runs at 50 frames per second and is capable of robustly estimating parameters of 3D expression variation.
arXiv Detail & Related papers (2020-05-12T01:32:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.