3D Hand Pose and Shape Estimation from RGB Images for Improved
Keypoint-Based Hand-Gesture Recognition
- URL: http://arxiv.org/abs/2109.13879v1
- Date: Tue, 28 Sep 2021 17:07:43 GMT
- Title: 3D Hand Pose and Shape Estimation from RGB Images for Improved
Keypoint-Based Hand-Gesture Recognition
- Authors: Danilo Avola, Luigi Cinque, Alessio Fagioli, Gian Luca Foresti,
Adriano Fragomeni, Daniele Pannone
- Abstract summary: This paper presents a keypoint-based end-to-end framework for the 3D hand and pose estimation.
It is successfully applied to the hand-gesture recognition task as a study case.
- Score: 25.379923604213626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating the 3D hand pose from a 2D image is a well-studied problem and a
requirement for several real-life applications such as virtual reality,
augmented reality, and hand-gesture recognition. Currently, good estimations
can be computed starting from single RGB images, especially when forcing the
system to also consider, through a multi-task learning approach, the hand shape
when the pose is determined. However, when addressing the aforementioned
real-life tasks, performances can drop considerably depending on the hand
representation, thus suggesting that stable descriptions are required to
achieve satisfactory results. As a consequence, in this paper we present a
keypoint-based end-to-end framework for the 3D hand and pose estimation, and
successfully apply it to the hand-gesture recognition task as a study case.
Specifically, after a pre-processing step where the images are normalized, the
proposed pipeline comprises a multi-task semantic feature extractor generating
2D heatmaps and hand silhouettes from RGB images; a viewpoint encoder
predicting hand and camera view parameters; a stable hand estimator producing
the 3D hand pose and shape; and a loss function designed to jointly guide all
of the components during the learning phase. To assess the proposed framework,
tests were performed on a 3D pose and shape estimation benchmark dataset,
obtaining state-of-the-art performances. What is more, the devised system was
also evaluated on 2 hand-gesture recognition benchmark datasets, where the
framework significantly outperforms other keypoint-based approaches; indicating
that the presented method is an effective solution able to generate stable 3D
estimates for the hand pose and shape.
Related papers
- SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition [5.359837526794863]
Hand pose represents key information for action recognition in the egocentric perspective.
We propose to improve egocentric 3D hand pose estimation based on RGB frames only by using pseudo-depth images.
arXiv Detail & Related papers (2024-08-19T14:30:29Z) - In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition [1.4732811715354455]
Action recognition is essential for egocentric video understanding, allowing automatic and continuous monitoring of Activities of Daily Living (ADLs) without user effort.
Existing literature focuses on 3D hand pose input, which requires computationally intensive depth estimation networks or wearing an uncomfortable depth sensor.
We introduce two novel approaches for 2D hand pose estimation, namely EffHandNet for single-hand estimation and EffHandEgoNet, tailored for an egocentric perspective.
arXiv Detail & Related papers (2024-04-14T17:33:33Z) - HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud [60.47544798202017]
Hand pose estimation is a critical task in various human-computer interaction applications.
This paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds.
Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets.
arXiv Detail & Related papers (2024-04-04T02:15:16Z) - CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware
Prompting [38.678165053219644]
We make one of the first attempts to propose a novel 3D hand pose estimator from monocular images, dubbed as CLIP-Hand3D.
We maximize semantic consistency for a pair of pose-text features following a CLIP-based contrastive learning paradigm.
Experiments on several public hand benchmarks show that the proposed model attains a significantly faster inference speed.
arXiv Detail & Related papers (2023-09-28T03:40:37Z) - Denoising Diffusion for 3D Hand Pose Estimation from Images [38.20064386142944]
This paper addresses the problem of 3D hand pose estimation from monocular images or sequences.
We present a novel end-to-end framework for 3D hand regression that employs diffusion models that have shown excellent ability to capture the distribution of data for generative purposes.
The proposed model provides state-of-the-art performance when lifting a 2D single-hand image to 3D.
arXiv Detail & Related papers (2023-08-18T12:57:22Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - Self-Supervised 3D Hand Pose Estimation from monocular RGB via
Contrastive Learning [50.007445752513625]
We propose a new self-supervised method for the structured regression task of 3D hand pose estimation.
We experimentally investigate the impact of invariant and equivariant contrastive objectives.
We show that a standard ResNet-152, trained on additional unlabeled data, attains an improvement of $7.6%$ in PA-EPE on FreiHAND.
arXiv Detail & Related papers (2021-06-10T17:48:57Z) - MM-Hand: 3D-Aware Multi-Modal Guided Hand Generative Network for 3D Hand
Pose Synthesis [81.40640219844197]
Estimating the 3D hand pose from a monocular RGB image is important but challenging.
A solution is training on large-scale RGB hand images with accurate 3D hand keypoint annotations.
We have developed a learning-based approach to synthesize realistic, diverse, and 3D pose-preserving hand images.
arXiv Detail & Related papers (2020-10-02T18:27:34Z) - SeqHAND:RGB-Sequence-Based 3D Hand Pose and Shape Estimation [48.456638103309544]
3D hand pose estimation based on RGB images has been studied for a long time.
We propose a novel method that generates a synthetic dataset that mimics natural human hand movements.
We show that utilizing temporal information for 3D hand pose estimation significantly enhances general pose estimations.
arXiv Detail & Related papers (2020-07-10T05:11:14Z) - Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.