Tracking and Reconstructing Hand Object Interactions from Point Cloud
Sequences in the Wild
- URL: http://arxiv.org/abs/2209.12009v1
- Date: Sat, 24 Sep 2022 13:40:09 GMT
- Title: Tracking and Reconstructing Hand Object Interactions from Point Cloud
Sequences in the Wild
- Authors: Jiayi Chen, Mi Yan, Jiazhao Zhang, Yinzhen Xu, Xiaolong Li, Yijia
Weng, Li Yi, Shuran Song, He Wang
- Abstract summary: We propose a point cloud based hand joint tracking network, HandTrackNet, to estimate the inter-frame hand joint motion.
Our pipeline then reconstructs the full hand via converting the predicted hand joints into a template-based parametric hand model MANO.
For object tracking, we devise a simple yet effective module that estimates the object SDF from the first frame and performs optimization-based tracking.
- Score: 35.55753131098285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we tackle the challenging task of jointly tracking hand object
pose and reconstructing their shapes from depth point cloud sequences in the
wild, given the initial poses at frame 0. We for the first time propose a point
cloud based hand joint tracking network, HandTrackNet, to estimate the
inter-frame hand joint motion. Our HandTrackNet proposes a novel hand pose
canonicalization module to ease the tracking task, yielding accurate and robust
hand joint tracking. Our pipeline then reconstructs the full hand via
converting the predicted hand joints into a template-based parametric hand
model MANO. For object tracking, we devise a simple yet effective module that
estimates the object SDF from the first frame and performs optimization-based
tracking. Finally, a joint optimization step is adopted to perform joint hand
and object reasoning, which alleviates the occlusion-induced ambiguity and
further refines the hand pose. During training, the whole pipeline only sees
purely synthetic data, which are synthesized with sufficient variations and by
depth simulation for the ease of generalization. The whole pipeline is
pertinent to the generalization gaps and thus directly transferable to real
in-the-wild data. We evaluate our method on two real hand object interaction
datasets, e.g. HO3D and DexYCB, without any finetuning. Our experiments
demonstrate that the proposed method significantly outperforms the previous
state-of-the-art depth-based hand and object pose estimation and tracking
methods, running at a frame rate of 9 FPS.
Related papers
- Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.
Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.
We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - ManiDext: Hand-Object Manipulation Synthesis via Continuous Correspondence Embeddings and Residual-Guided Diffusion [36.9457697304841]
ManiDext is a unified hierarchical diffusion-based framework for generating hand manipulation and grasp poses.
Our key insight is that accurately modeling the contact correspondences between objects and hands during interactions is crucial.
Our framework first generates contact maps and correspondence embeddings on the object's surface.
Based on these fine-grained correspondences, we introduce a novel approach that integrates the iterative refinement process into the diffusion process.
arXiv Detail & Related papers (2024-09-14T04:28:44Z) - DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image [98.29284902879652]
We present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image.
It features disentangling the regression of local deformation fields and global mesh locations into two network branches.
It achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility.
arXiv Detail & Related papers (2024-06-26T00:08:29Z) - Realistic Full-Body Tracking from Sparse Observations via Joint-Level
Modeling [13.284947022380404]
We propose a two-stage framework that can obtain accurate and smooth full-body motions with three tracking signals of head and hands only.
Our framework explicitly models the joint-level features in the first stage and utilizes them astemporal tokens for alternating spatial and temporal transformer blocks to capture joint-level correlations in the second stage.
With extensive experiments on the AMASS motion dataset and real-captured data, we show our proposed method can achieve more accurate and smooth motion compared to existing approaches.
arXiv Detail & Related papers (2023-08-17T08:27:55Z) - HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands.
We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z) - Real-time Pose and Shape Reconstruction of Two Interacting Hands With a
Single Depth Camera [79.41374930171469]
We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands.
Our approach combines an extensive list of favorable properties, namely it is marker-less.
We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work.
arXiv Detail & Related papers (2021-06-15T11:39:49Z) - "What's This?" -- Learning to Segment Unknown Objects from Manipulation
Sequences [27.915309216800125]
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator.
We propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge.
Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data.
arXiv Detail & Related papers (2020-11-06T10:55:28Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.