Related papers: Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data

Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data

URL: http://arxiv.org/abs/2003.09572v3
Date: Fri, 11 Mar 2022 13:39:43 GMT
Title: Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data
Authors: Yuxiao Zhou and Marc Habermann and Weipeng Xu and Ikhsanul Habibie and Christian Theobalt and Feng Xu
Abstract summary: We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps. This is enabled by a new learning based architecture designed such that it can make use of all the sources of available hand training data. It features a 3D hand joint detection module and an inverse kinematics module which regresses not only 3D joint positions but also maps them to joint rotations in a single feed-forward pass.
Score: 77.34069717612493
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps and at state-of-the-art accuracy. This is enabled by a new learning based architecture designed such that it can make use of all the sources of available hand training data: image data with either 2D or 3D annotations, as well as stand-alone 3D animations without corresponding image data. It features a 3D hand joint detection module and an inverse kinematics module which regresses not only 3D joint positions but also maps them to joint rotations in a single feed-forward pass. This output makes the method more directly usable for applications in computer vision and graphics compared to only regressing 3D joint positions. We demonstrate that our architectural design leads to a significant quantitative and qualitative improvement over the state of the art on several challenging benchmarks. Our model is publicly available for future research.

Related papers

Toward a Real-Time Framework for Accurate Monocular 3D Human Pose Estimation with Geometric Priors [0.0]
We propose a framework that combines real-time 2D keypoint detection with geometry-aware 2D-to-3D lifting.<n>We discuss how these ingredients can enable fast, personalized, and accurate 3D pose estimation from monocular images without requiring specialized hardware.
arXiv Detail & Related papers (2025-07-21T08:18:23Z)
JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting [3.1143479095236892]
Jointly 3D Gaussian Hand (JGHand) is a novel joint-driven 3D Gaussian Splatting (3DGS)-based hand representation. We show that JGHand achieves real-time rendering speeds with enhanced quality, surpassing state-of-the-art methods.
arXiv Detail & Related papers (2025-01-31T12:33:24Z)
The More You See in 2D, the More You Perceive in 3D [32.578628729549145]
SAP3D is a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images. We show that as the number of input images increases, the performance of our approach improves.
arXiv Detail & Related papers (2024-04-04T17:59:40Z)
3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images. We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image. We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z)
Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular Videos in the Wild [10.849750765175754]
POTR-3D is a sequence-to-sequence 2D-to-3D lifting model for 3DMPPE. It robustly generalizes to diverse unseen views, robustly recovers the poses against heavy occlusions, and reliably generates more natural and smoother outputs.
arXiv Detail & Related papers (2023-09-15T06:17:22Z)
Denoising Diffusion for 3D Hand Pose Estimation from Images [38.20064386142944]
This paper addresses the problem of 3D hand pose estimation from monocular images or sequences. We present a novel end-to-end framework for 3D hand regression that employs diffusion models that have shown excellent ability to capture the distribution of data for generative purposes. The proposed model provides state-of-the-art performance when lifting a 2D single-hand image to 3D.
arXiv Detail & Related papers (2023-08-18T12:57:22Z)
AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space. We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z)
Consistent 3D Hand Reconstruction in Video via self-supervised Learning [67.55449194046996]
We present a method for reconstructing accurate and consistent 3D hands from a monocular video. detected 2D hand keypoints and the image texture provide important cues about the geometry and texture of the 3D hand. We propose $rm S2HAND$, a self-supervised 3D hand reconstruction model.
arXiv Detail & Related papers (2022-01-24T09:44:11Z)
Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames. Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras. We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points. Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.