Neural Sign Reenactor: Deep Photorealistic Sign Language Retargeting
- URL: http://arxiv.org/abs/2209.01470v2
- Date: Tue, 30 May 2023 17:07:26 GMT
- Title: Neural Sign Reenactor: Deep Photorealistic Sign Language Retargeting
- Authors: Christina O. Tze, Panagiotis P. Filntisis, Athanasia-Lida Dimou,
Anastasios Roussos, Petros Maragos
- Abstract summary: We introduce a neural rendering pipeline for transferring the facial expressions, head pose, and body movements of one person in a source video to another in a target video.
Our method can be used for Sign Language Anonymization, Sign Language Production (synthesis module), as well as for reenacting other types of full body activities.
- Score: 28.012212656892746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce a neural rendering pipeline for transferring the
facial expressions, head pose, and body movements of one person in a source
video to another in a target video. We apply our method to the challenging case
of Sign Language videos: given a source video of a sign language user, we can
faithfully transfer the performed manual (e.g., handshape, palm orientation,
movement, location) and non-manual (e.g., eye gaze, facial expressions, mouth
patterns, head, and body movements) signs to a target video in a
photo-realistic manner. Our method can be used for Sign Language Anonymization,
Sign Language Production (synthesis module), as well as for reenacting other
types of full body activities (dancing, acting performance, exercising, etc.).
We conduct detailed qualitative and quantitative evaluations and comparisons,
which demonstrate the particularly promising and realistic results that we
obtain and the advantages of our method over existing approaches.
Related papers
- Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs [67.27840327499625]
We present a multimodal learning-based method to simultaneously synthesize co-speech facial expressions and upper-body gestures for digital characters.
Our approach learns from sparse face landmarks and upper-body joints, estimated directly from video data, to generate plausible emotive character motions.
arXiv Detail & Related papers (2024-06-26T04:53:11Z) - Generation and Detection of Sign Language Deepfakes - A Linguistic and Visual Analysis [6.189190729240752]
This research explores the positive application of deepfake technology for upper body generation, specifically sign language for the Deaf and Hard of Hearing (DHoH) community.
We construct a reliable deepfake dataset, evaluating its technical and visual credibility using computer vision and natural language processing models.
The dataset, consisting of over 1200 videos featuring both seen and unseen individuals, is also used to detect deepfake videos targeting vulnerable individuals.
arXiv Detail & Related papers (2024-04-01T19:22:43Z) - From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands.
We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures.
Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z) - Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor.
We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video.
We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z) - Language-Guided Face Animation by Recurrent StyleGAN-based Generator [87.56260982475564]
We study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages.
We propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.
arXiv Detail & Related papers (2022-08-11T02:57:30Z) - Copy Motion From One to Another: Fake Motion Video Generation [53.676020148034034]
A compelling application of artificial intelligence is to generate a video of a target person performing arbitrary desired motion.
Current methods typically employ GANs with a L2 loss to assess the authenticity of the generated videos.
We propose a theoretically motivated Gromov-Wasserstein loss that facilitates learning the mapping from a pose to a foreground image.
Our method is able to generate realistic target person videos, faithfully copying complex motions from a source person.
arXiv Detail & Related papers (2022-05-03T08:45:22Z) - Neural Emotion Director: Speech-preserving semantic control of facial
expressions in "in-the-wild" videos [31.746152261362777]
We introduce a novel deep learning method for photo-realistic manipulation of the emotional state of actors in "in-the-wild" videos.
The proposed method is based on a parametric 3D face representation of the actor in the input scene that offers a reliable disentanglement of the facial identity from the head pose and facial expressions.
It then uses a novel deep domain translation framework that alters the facial expressions in a consistent and plausible manner, taking into account their dynamics.
arXiv Detail & Related papers (2021-12-01T15:55:04Z) - Deep Semantic Manipulation of Facial Videos [5.048861360606916]
This paper proposes the first method to perform photorealistic manipulation of facial expressions in videos.
Our method supports semantic video manipulation based on neural rendering and 3D-based facial expression modelling.
The proposed method is based on a disentangled representation and estimation of the 3D facial shape and activity.
arXiv Detail & Related papers (2021-11-15T16:55:16Z) - Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign
Language Video [43.45785951443149]
To be truly understandable by Deaf communities, an automatic Sign Language Production system must generate a photo-realistic signer.
We propose SignGAN, the first SLP model to produce photo-realistic continuous sign language videos directly from spoken language.
A pose-conditioned human synthesis model is then introduced to generate a photo-realistic sign language video from the skeletal pose sequence.
arXiv Detail & Related papers (2020-11-19T14:31:06Z) - ReenactNet: Real-time Full Head Reenactment [50.32988828989691]
We propose a head-to-head system capable of fully transferring the human head 3D pose, facial expressions and eye gaze from a source to a target actor.
Our system produces high-fidelity, temporally-smooth and photo-realistic synthetic videos faithfully transferring the human time-varying head attributes from the source to the target actor.
arXiv Detail & Related papers (2020-05-22T00:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.