Continuously Controllable Facial Expression Editing in Talking Face
Videos
- URL: http://arxiv.org/abs/2209.08289v2
- Date: Tue, 28 Nov 2023 15:31:04 GMT
- Title: Continuously Controllable Facial Expression Editing in Talking Face
Videos
- Authors: Zhiyao Sun, Yu-Hui Wen, Tian Lv, Yanan Sun, Ziyang Zhang, Yaoyuan
Wang, Yong-Jin Liu
- Abstract summary: Speech-related expressions and emotion-related expressions are often highly coupled.
Traditional image-to-image translation methods cannot work well in our application.
We propose a high-quality facial expression editing method for talking face videos.
- Score: 34.83353695337335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently audio-driven talking face video generation has attracted
considerable attention. However, very few researches address the issue of
emotional editing of these talking face videos with continuously controllable
expressions, which is a strong demand in the industry. The challenge is that
speech-related expressions and emotion-related expressions are often highly
coupled. Meanwhile, traditional image-to-image translation methods cannot work
well in our application due to the coupling of expressions with other
attributes such as poses, i.e., translating the expression of the character in
each frame may simultaneously change the head pose due to the bias of the
training data distribution. In this paper, we propose a high-quality facial
expression editing method for talking face videos, allowing the user to control
the target emotion in the edited video continuously. We present a new
perspective for this task as a special case of motion information editing,
where we use a 3DMM to capture major facial movements and an associated texture
map modeled by a StyleGAN to capture appearance details. Both representations
(3DMM and texture map) contain emotional information and can be continuously
modified by neural networks and easily smoothed by averaging in
coefficient/latent spaces, making our method simple yet effective. We also
introduce a mouth shape preservation loss to control the trade-off between lip
synchronization and the degree of exaggeration of the edited expression.
Extensive experiments and a user study show that our method achieves
state-of-the-art performance across various evaluation criteria.
Related papers
- EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion [5.954758598327494]
EMOdiffhead is a novel method for emotional talking head video generation.
It enables fine-grained control of emotion categories and intensities.
It achieves state-of-the-art performance compared to other emotion portrait animation methods.
arXiv Detail & Related papers (2024-09-11T13:23:22Z) - Towards Localized Fine-Grained Control for Facial Expression Generation [54.82883891478555]
Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent.
Current generative models mostly generate flat neutral expressions and characterless smiles without authenticity.
We propose the use of AUs (action units) for facial expression control in face generation.
arXiv Detail & Related papers (2024-07-25T18:29:48Z) - Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation [12.044308738509402]
We propose a two-stage audio-driven talking face generation framework that employs 3D facial landmarks as intermediate variables.
This framework achieves collaborative alignment of expression, gaze, and pose with emotions through self-supervised learning.
Our model significantly advances the state-of-the-art performance in both visual quality and emotional alignment.
arXiv Detail & Related papers (2024-06-12T06:00:00Z) - Controllable Talking Face Generation by Implicit Facial Keypoints Editing [6.036277153327655]
We present ControlTalk, a talking face generation method to control face expression deformation based on driven audio.
Our experiments show that our method is superior to state-of-the-art performance on widely used benchmarks, including HDTF and MEAD.
arXiv Detail & Related papers (2024-06-05T02:54:46Z) - DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for
Single Image Talking Face Generation [75.90730434449874]
We introduce DREAM-Talk, a two-stage diffusion-based audio-driven framework, tailored for generating diverse expressions and accurate lip-sync concurrently.
Given the strong correlation between lip motion and audio, we then refine the dynamics with enhanced lip-sync accuracy using audio features and emotion style.
Both quantitatively and qualitatively, DREAM-Talk outperforms state-of-the-art methods in terms of expressiveness, lip-sync accuracy and perceptual quality.
arXiv Detail & Related papers (2023-12-21T05:03:18Z) - Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a
Short Video [91.92782707888618]
We present a decomposition-composition framework named Speech to Lip (Speech2Lip) that disentangles speech-sensitive and speech-insensitive motion/appearance.
We show that our model can be trained by a video of just a few minutes in length and achieve state-of-the-art performance in both visual quality and speech-visual synchronization.
arXiv Detail & Related papers (2023-09-09T14:52:39Z) - GaFET: Learning Geometry-aware Facial Expression Translation from
In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression.
We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z) - Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor.
We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video.
We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z) - Neural Emotion Director: Speech-preserving semantic control of facial
expressions in "in-the-wild" videos [31.746152261362777]
We introduce a novel deep learning method for photo-realistic manipulation of the emotional state of actors in "in-the-wild" videos.
The proposed method is based on a parametric 3D face representation of the actor in the input scene that offers a reliable disentanglement of the facial identity from the head pose and facial expressions.
It then uses a novel deep domain translation framework that alters the facial expressions in a consistent and plausible manner, taking into account their dynamics.
arXiv Detail & Related papers (2021-12-01T15:55:04Z) - Pose-Controllable Talking Face Generation by Implicitly Modularized
Audio-Visual Representation [96.66010515343106]
We propose a clean yet effective framework to generate pose-controllable talking faces.
We operate on raw face images, using only a single photo as an identity reference.
Our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.
arXiv Detail & Related papers (2021-04-22T15:10:26Z) - Write-a-speaker: Text-based Emotional and Rhythmic Talking-head
Generation [28.157431757281692]
We propose a text-based talking-head video generation framework that synthesizes high-fidelity facial expressions and head motions.
Our framework consists of a speaker-independent stage and a speaker-specific stage.
Our algorithm achieves high-quality photo-realistic talking-head videos including various facial expressions and head motions according to speech rhythms.
arXiv Detail & Related papers (2021-04-16T09:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.