TriHorn-Net: A Model for Accurate Depth-Based 3D Hand Pose Estimation
- URL: http://arxiv.org/abs/2206.07117v1
- Date: Tue, 14 Jun 2022 19:08:42 GMT
- Title: TriHorn-Net: A Model for Accurate Depth-Based 3D Hand Pose Estimation
- Authors: Mohammad Rezaei, Razieh Rastgoo, and Vassilis Athitsos
- Abstract summary: TriHorn-Net is a novel model that uses specific innovations to improve hand pose estimation accuracy on depth images.
The first innovation is the decomposition of the 3D hand pose estimation into the estimation of 2D joint locations in the depth image space.
The second innovation is PixDropout, which is, to the best of our knowledge, the first appearance-based data augmentation method for hand depth images.
- Score: 8.946655323517092
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: 3D hand pose estimation methods have made significant progress recently.
However, estimation accuracy is often far from sufficient for specific
real-world applications, and thus there is significant room for improvement.
This paper proposes TriHorn-Net, a novel model that uses specific innovations
to improve hand pose estimation accuracy on depth images. The first innovation
is the decomposition of the 3D hand pose estimation into the estimation of 2D
joint locations in the depth image space (UV), and the estimation of their
corresponding depths aided by two complementary attention maps. This
decomposition prevents depth estimation, which is a more difficult task, from
interfering with the UV estimations at both the prediction and feature levels.
The second innovation is PixDropout, which is, to the best of our knowledge,
the first appearance-based data augmentation method for hand depth images.
Experimental results demonstrate that the proposed model outperforms the
state-of-the-art methods on three public benchmark datasets.
Related papers
- SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition [5.359837526794863]
Hand pose represents key information for action recognition in the egocentric perspective.
We propose to improve egocentric 3D hand pose estimation based on RGB frames only by using pseudo-depth images.
arXiv Detail & Related papers (2024-08-19T14:30:29Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud [60.47544798202017]
Hand pose estimation is a critical task in various human-computer interaction applications.
This paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds.
Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets.
arXiv Detail & Related papers (2024-04-04T02:15:16Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - On the role of depth predictions for 3D human pose estimation [0.04199844472131921]
We build a system that takes 2d joint locations as input along with their estimated depth value and predicts their 3d positions in camera coordinates.
Results are produced on neural network that accepts a low dimensional input and be integrated into a real-time system.
Our system can be combined with an off-the-shelf 2d pose detector and a depth map predictor to perform 3d pose estimation in the wild.
arXiv Detail & Related papers (2021-03-03T16:51:38Z) - DGGAN: Depth-image Guided Generative Adversarial Networks for
Disentangling RGB and Depth Images in 3D Hand Pose Estimation [33.23818997206978]
Estimating 3D hand poses from RGB images is essential to a wide range of potential applications, but is challengingowing to substantial ambiguity in the inference of depth in-formation from RGB images.
We propose a conditional generative adversarial network (GAN) model,called Depth-image Guided GAN (DGGAN), to generate re-alistic depth maps conditioned on the input RGB image.
Experimental results on multiplebenchmark datasets show that the synthesized depth mapsproduced by DGGAN are quite effective in regularizing thepose estimation model.
arXiv Detail & Related papers (2020-12-06T07:23:21Z) - Residual Pose: A Decoupled Approach for Depth-based 3D Human Pose
Estimation [18.103595280706593]
We leverage recent advances in reliable 2D pose estimation with CNN to estimate the 3D pose of people from depth images.
Our approach achieves very competitive results both in accuracy and speed on two public datasets.
arXiv Detail & Related papers (2020-11-10T10:08:13Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose
Estimation from a Single Depth Map [72.93634777578336]
We propose a novel architecture with 3D convolutions trained in a weakly-supervised manner.
The proposed approach improves over the state of the art by 47.8% on the SynHand5M dataset.
Our method produces visually more reasonable and realistic hand shapes on NYU and BigHand2.2M datasets.
arXiv Detail & Related papers (2020-04-03T14:27:16Z) - Silhouette-Net: 3D Hand Pose Estimation from Silhouettes [16.266199156878056]
Existing approaches mainly consider different input modalities and settings, such as monocular RGB, multi-view RGB, depth, or point cloud.
We present a new architecture that automatically learns a guidance from implicit depth perception and solves the ambiguity of hand pose through end-to-end training.
arXiv Detail & Related papers (2019-12-28T10:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.