Independent Sign Language Recognition with 3D Body, Hands, and Face
Reconstruction
- URL: http://arxiv.org/abs/2012.05698v1
- Date: Tue, 24 Nov 2020 23:50:26 GMT
- Title: Independent Sign Language Recognition with 3D Body, Hands, and Face
Reconstruction
- Authors: Agelos Kratimenos, Georgios Pavlakos, Petros Maragos
- Abstract summary: Independent Sign Language Recognition is a complex visual recognition problem that combines several challenging tasks of Computer Vision.
No work has adequately combined all three information channels to efficiently recognize Sign Language.
We employ SMPL-X, a contemporary parametric model that enables joint extraction of 3D body shape, face and hands information from a single image.
- Score: 46.70761714133466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Independent Sign Language Recognition is a complex visual recognition problem
that combines several challenging tasks of Computer Vision due to the necessity
to exploit and fuse information from hand gestures, body features and facial
expressions. While many state-of-the-art works have managed to deeply elaborate
on these features independently, to the best of our knowledge, no work has
adequately combined all three information channels to efficiently recognize
Sign Language. In this work, we employ SMPL-X, a contemporary parametric model
that enables joint extraction of 3D body shape, face and hands information from
a single image. We use this holistic 3D reconstruction for SLR, demonstrating
that it leads to higher accuracy than recognition from raw RGB images and their
optical flow fed into the state-of-the-art I3D-type network for 3D action
recognition and from 2D Openpose skeletons fed into a Recurrent Neural Network.
Finally, a set of experiments on the body, face and hand features showed that
neglecting any of these, significantly reduces the classification accuracy,
proving the importance of jointly modeling body shape, facial expression and
hand pose for Sign Language Recognition.
Related papers
- ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling [96.87575334960258]
ID-to-3D is a method to generate identity- and text-guided 3D human heads with disentangled expressions.
Results achieve an unprecedented level of identity-consistent and high-quality texture and geometry generation.
arXiv Detail & Related papers (2024-05-26T13:36:45Z) - Brain3D: Generating 3D Objects from fMRI [76.41771117405973]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject.
We show that our model captures the distinct functionalities of each region of human vision system.
Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z) - Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction [8.63068449082585]
Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition.
Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D.
We have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development.
arXiv Detail & Related papers (2024-04-30T10:41:23Z) - 3D Facial Expressions through Analysis-by-Neural-Synthesis [30.2749903946587]
SMIRK (Spatial Modeling for Image-based Reconstruction of Kinesics) faithfully reconstructs expressive 3D faces from images.
We identify two key limitations in existing methods: shortcomings in their self-supervised training formulation, and a lack of expression diversity in the training images.
Our qualitative, quantitative and particularly our perceptual evaluations demonstrate that SMIRK achieves the new state-of-the art performance on accurate expression reconstruction.
arXiv Detail & Related papers (2024-04-05T14:00:07Z) - DrFER: Learning Disentangled Representations for 3D Facial Expression
Recognition [28.318304721838096]
We introduce the innovative DrFER method, which brings the concept of disentangled representation learning to the field of 3D FER.
DrFER employs a dual-branch framework to effectively disentangle expression information from identity information.
This adaptation enhances the capability of the framework in recognizing facial expressions, even in cases involving varying head poses.
arXiv Detail & Related papers (2024-03-13T08:00:07Z) - Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [54.079327030892244]
Free-HeadGAN is a person-generic neural talking head synthesis system.
We show that modeling faces with sparse 3D facial landmarks are sufficient for achieving state-of-the-art generative performance.
arXiv Detail & Related papers (2022-08-03T16:46:08Z) - Model-based 3D Hand Reconstruction via Self-Supervised Learning [72.0817813032385]
Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity.
We propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint.
For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations.
arXiv Detail & Related papers (2021-03-22T10:12:43Z) - Learning 3D Face Reconstruction with a Pose Guidance Network [49.13404714366933]
We present a self-supervised learning approach to learning monocular 3D face reconstruction with a pose guidance network (PGN)
First, we unveil the bottleneck of pose estimation in prior parametric 3D face learning methods, and propose to utilize 3D face landmarks for estimating pose parameters.
With our specially designed PGN, our model can learn from both faces with fully labeled 3D landmarks and unlimited unlabeled in-the-wild face images.
arXiv Detail & Related papers (2020-10-09T06:11:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.