Reconstructing Signing Avatars From Video Using Linguistic Priors
- URL: http://arxiv.org/abs/2304.10482v1
- Date: Thu, 20 Apr 2023 17:29:50 GMT
- Title: Reconstructing Signing Avatars From Video Using Linguistic Priors
- Authors: Maria-Paola Forte and Peter Kulits and Chun-Hao Huang and Vasileios
Choutas and Dimitrios Tzionas and Katherine J. Kuchenbecker and Michael J.
Black
- Abstract summary: Sign language (SL) is the primary method of communication for the 70 million Deaf people around the world.
replacing video dictionaries of isolated signs with 3D avatars can aid learning and enable AR/VR applications.
SGNify captures fine-grained hand pose, facial expression, and body movement fully automatically from in-the-wild monocular SL videos.
- Score: 54.5282429129769
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sign language (SL) is the primary method of communication for the 70 million
Deaf people around the world. Video dictionaries of isolated signs are a core
SL learning tool. Replacing these with 3D avatars can aid learning and enable
AR/VR applications, improving access to technology and online media. However,
little work has attempted to estimate expressive 3D avatars from SL video;
occlusion, noise, and motion blur make this task difficult. We address this by
introducing novel linguistic priors that are universally applicable to SL and
provide constraints on 3D hand pose that help resolve ambiguities within
isolated signs. Our method, SGNify, captures fine-grained hand pose, facial
expression, and body movement fully automatically from in-the-wild monocular SL
videos. We evaluate SGNify quantitatively by using a commercial motion-capture
system to compute 3D avatars synchronized with monocular video. SGNify
outperforms state-of-the-art 3D body-pose- and shape-estimation methods on SL
videos. A perceptual study shows that SGNify's 3D reconstructions are
significantly more comprehensible and natural than those of previous methods
and are on par with the source videos. Code and data are available at
$\href{http://sgnify.is.tue.mpg.de}{\text{sgnify.is.tue.mpg.de}}$.
Related papers
- Neural Sign Actors: A diffusion model for 3D sign language production from text [51.81647203840081]
Sign Languages (SL) serve as the primary mode of communication for the Deaf and Hard of Hearing communities.
This work makes an important step towards realistic neural sign avatars, bridging the communication gap between Deaf and hearing communities.
arXiv Detail & Related papers (2023-12-05T12:04:34Z) - SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark [20.11364909443987]
SignAvatars is the first large-scale, multi-prompt 3D sign language (SL) motion dataset designed to bridge the communication gap for Deaf and hard-of-hearing individuals.
The dataset comprises 70,000 videos from 153 signers, totaling 8.34 million frames, covering both isolated signs and continuous, co-articulated signs.
arXiv Detail & Related papers (2023-10-31T13:15:49Z) - PLA: Language-Driven Open-Vocabulary 3D Scene Understanding [57.47315482494805]
Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space.
Recent breakthrough of 2D open-vocabulary perception is driven by Internet-scale paired image-text data with rich vocabulary concepts.
We propose to distill knowledge encoded in pre-trained vision-language (VL) foundation models through captioning multi-view images from 3D.
arXiv Detail & Related papers (2022-11-29T15:52:22Z) - Prompt-guided Scene Generation for 3D Zero-Shot Learning [8.658191774247944]
We propose a prompt-guided 3D scene generation and supervision method to augment 3D data to learn the network better.
First, we merge point clouds of two 3D models in certain ways described by a prompt. The prompt acts like the annotation describing each 3D scene.
We have achieved state-of-the-art ZSL and generalized ZSL performance on synthetic (ModelNet40, ModelNet10) and real-scanned (ScanOjbectNN) 3D object datasets.
arXiv Detail & Related papers (2022-09-29T11:24:33Z) - Learning Speech-driven 3D Conversational Gestures from Video [106.15628979352738]
We propose the first approach to automatically and jointly synthesize both the synchronous 3D conversational body and hand gestures.
Our algorithm uses a CNN architecture that leverages the inherent correlation between facial expression and hand gestures.
We also contribute a new way to create a large corpus of more than 33 hours of annotated body, hand, and face data from in-the-wild videos of talking people.
arXiv Detail & Related papers (2021-02-13T01:05:39Z) - Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the
Wild [22.881898195409885]
Given an "in-the-wild" video of a person, we reconstruct an animatable model of the person in the video.
The output model can be rendered in any body pose to any camera view, via the learned controls, without explicit 3D mesh reconstruction.
arXiv Detail & Related papers (2020-12-23T18:50:42Z) - Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking.
Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z) - Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body
Dynamics [87.17505994436308]
We build upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings.
We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone.
Our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as input.
arXiv Detail & Related papers (2020-07-23T22:58:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.