Denoising Diffusion for 3D Hand Pose Estimation from Images
- URL: http://arxiv.org/abs/2308.09523v1
- Date: Fri, 18 Aug 2023 12:57:22 GMT
- Title: Denoising Diffusion for 3D Hand Pose Estimation from Images
- Authors: Maksym Ivashechkin, Oscar Mendez, Richard Bowden
- Abstract summary: This paper addresses the problem of 3D hand pose estimation from monocular images or sequences.
We present a novel end-to-end framework for 3D hand regression that employs diffusion models that have shown excellent ability to capture the distribution of data for generative purposes.
The proposed model provides state-of-the-art performance when lifting a 2D single-hand image to 3D.
- Score: 38.20064386142944
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hand pose estimation from a single image has many applications. However,
approaches to full 3D body pose estimation are typically trained on day-to-day
activities or actions. As such, detailed hand-to-hand interactions are poorly
represented, especially during motion. We see this in the failure cases of
techniques such as OpenPose or MediaPipe. However, accurate hand pose
estimation is crucial for many applications where the global body motion is
less important than accurate hand pose estimation.
This paper addresses the problem of 3D hand pose estimation from monocular
images or sequences. We present a novel end-to-end framework for 3D hand
regression that employs diffusion models that have shown excellent ability to
capture the distribution of data for generative purposes. Moreover, we enforce
kinematic constraints to ensure realistic poses are generated by incorporating
an explicit forward kinematic layer as part of the network. The proposed model
provides state-of-the-art performance when lifting a 2D single-hand image to
3D. However, when sequence data is available, we add a Transformer module over
a temporal window of consecutive frames to refine the results, overcoming
jittering and further increasing accuracy.
The method is quantitatively and qualitatively evaluated showing
state-of-the-art robustness, generalization, and accuracy on several different
datasets.
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud [60.47544798202017]
Hand pose estimation is a critical task in various human-computer interaction applications.
This paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds.
Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets.
arXiv Detail & Related papers (2024-04-04T02:15:16Z) - D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose
Refinement [3.514184876338779]
A Diffusion-based 3D Pose Refiner is proposed to refine the output of any existing 3D pose estimator.
We leverage the architecture of current diffusion models to convert the distribution of noisy 3D poses into ground truth 3D poses.
Experimental results demonstrate the proposed architecture can significantly improve the performance of current sequence-to-sequence 3D pose estimators.
arXiv Detail & Related papers (2024-01-08T14:21:02Z) - Self-Supervised 3D Hand Pose Estimation from monocular RGB via
Contrastive Learning [50.007445752513625]
We propose a new self-supervised method for the structured regression task of 3D hand pose estimation.
We experimentally investigate the impact of invariant and equivariant contrastive objectives.
We show that a standard ResNet-152, trained on additional unlabeled data, attains an improvement of $7.6%$ in PA-EPE on FreiHAND.
arXiv Detail & Related papers (2021-06-10T17:48:57Z) - SeqHAND:RGB-Sequence-Based 3D Hand Pose and Shape Estimation [48.456638103309544]
3D hand pose estimation based on RGB images has been studied for a long time.
We propose a novel method that generates a synthetic dataset that mimics natural human hand movements.
We show that utilizing temporal information for 3D hand pose estimation significantly enhances general pose estimations.
arXiv Detail & Related papers (2020-07-10T05:11:14Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.