HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud
- URL: http://arxiv.org/abs/2404.03159v1
- Date: Thu, 4 Apr 2024 02:15:16 GMT
- Title: HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud
- Authors: Wencan Cheng, Hao Tang, Luc Van Gool, Jong Hwan Ko,
- Abstract summary: Hand pose estimation is a critical task in various human-computer interaction applications.
This paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds.
Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets.
- Score: 60.47544798202017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting keypoint locations from input hand frames, known as 3D hand pose estimation, is a critical task in various human-computer interaction applications. Essentially, the 3D hand pose estimation can be regarded as a 3D point subset generative problem conditioned on input frames. Thanks to the recent significant progress on diffusion-based generative models, hand pose estimation can also benefit from the diffusion model to estimate keypoint locations with high quality. However, directly deploying the existing diffusion models to solve hand pose estimation is non-trivial, since they cannot achieve the complex permutation mapping and precise localization. Based on this motivation, this paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds. In order to recover keypoint permutation and accurate location, we further introduce joint-wise condition and local detail condition. Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets. Codes and pre-trained models are publicly available at https://github.com/cwc1260/HandDiff.
Related papers
- HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed
Distance Fields [96.04424738803667]
HOISDF is a guided hand-object pose estimation network.
It exploits hand and object SDFs to provide a global, implicit representation over the complete reconstruction volume.
We show that HOISDF achieves state-of-the-art results on hand-object pose estimation benchmarks.
arXiv Detail & Related papers (2024-02-26T22:48:37Z) - Denoising Diffusion for 3D Hand Pose Estimation from Images [38.20064386142944]
This paper addresses the problem of 3D hand pose estimation from monocular images or sequences.
We present a novel end-to-end framework for 3D hand regression that employs diffusion models that have shown excellent ability to capture the distribution of data for generative purposes.
The proposed model provides state-of-the-art performance when lifting a 2D single-hand image to 3D.
arXiv Detail & Related papers (2023-08-18T12:57:22Z) - Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation [59.3035531612715]
Existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred.
In videos, the movements of the hand allow us to observe various parts of the hand that may be occluded or blurred in a single frame.
We propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image.
arXiv Detail & Related papers (2023-03-09T02:24:30Z) - DiffPose: Toward More Reliable 3D Pose Estimation [11.6015323757147]
We propose a novel pose estimation framework (DiffPose) that formulates 3D pose estimation as a reverse diffusion process.
Our proposed DiffPose significantly outperforms existing methods on the widely used pose estimation benchmarks Human3.6M and MPI-INF-3DHP.
arXiv Detail & Related papers (2022-11-30T12:22:22Z) - 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal [85.30756038989057]
Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions.
We propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches.
arXiv Detail & Related papers (2022-07-22T13:04:06Z) - 3D Hand Pose and Shape Estimation from RGB Images for Improved
Keypoint-Based Hand-Gesture Recognition [25.379923604213626]
This paper presents a keypoint-based end-to-end framework for the 3D hand and pose estimation.
It is successfully applied to the hand-gesture recognition task as a study case.
arXiv Detail & Related papers (2021-09-28T17:07:43Z) - HandFoldingNet: A 3D Hand Pose Estimation Network Using
Multiscale-Feature Guided Folding of a 2D Hand Skeleton [4.1954750695245835]
This paper proposes HandFoldingNet, an accurate and efficient hand pose estimator.
The proposed model utilizes a folding-based decoder that folds a given 2D hand skeleton into the corresponding joint coordinates.
Experimental results show that the proposed model outperforms the existing methods on three hand pose benchmark datasets.
arXiv Detail & Related papers (2021-08-12T05:52:44Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.