Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction
on Monocular RGB Video
- URL: http://arxiv.org/abs/2308.04074v3
- Date: Fri, 5 Jan 2024 02:03:52 GMT
- Title: Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction
on Monocular RGB Video
- Authors: Weichao Zhao, Hezhen Hu, Wengang Zhou, Li li, Houqiang Li
- Abstract summary: Reconstructing interacting hands from monocular RGB data is a challenging task, as it involves many interfering factors.
Previous works only leverage information from a single RGB image without modeling their physically plausible relation.
In this work, we are dedicated to explicitly exploiting spatial-temporal information to achieve better interacting hand reconstruction.
- Score: 104.69686024776396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing interacting hands from monocular RGB data is a challenging
task, as it involves many interfering factors, e.g. self- and mutual occlusion
and similar textures. Previous works only leverage information from a single
RGB image without modeling their physically plausible relation, which leads to
inferior reconstruction results. In this work, we are dedicated to explicitly
exploiting spatial-temporal information to achieve better interacting hand
reconstruction. On one hand, we leverage temporal context to complement
insufficient information provided by the single frame, and design a novel
temporal framework with a temporal constraint for interacting hand motion
smoothness. On the other hand, we further propose an interpenetration detection
module to produce kinetically plausible interacting hands without physical
collisions. Extensive experiments are performed to validate the effectiveness
of our proposed framework, which achieves new state-of-the-art performance on
public benchmarks.
Related papers
- Enhanced Spatio-Temporal Context for Temporally Consistent Robust 3D
Human Motion Recovery from Monocular Videos [5.258814754543826]
We propose a novel method for temporally consistent motion estimation from a monocular video.
Instead of using generic ResNet-like features, our method uses a body-aware feature representation and an independent per-frame pose.
Our method attains significantly lower acceleration error and outperforms the existing state-of-the-art methods.
arXiv Detail & Related papers (2023-11-20T10:53:59Z) - Physical Interaction: Reconstructing Hand-object Interactions with
Physics [17.90852804328213]
The paper proposes a physics-based method to better solve the ambiguities in the reconstruction.
It first proposes a force-based dynamic model of the in-hand object, which recovers the unobserved contacts and also solves for plausible contact forces.
Experiments show that the proposed technique reconstructs both physically plausible and more accurate hand-object interaction.
arXiv Detail & Related papers (2022-09-22T07:41:31Z) - Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action
Recognition from Egocentric RGB Videos [50.74218823358754]
We develop a transformer-based framework to exploit temporal information for robust estimation.
We build a network hierarchy with two cascaded transformer encoders, where the first one exploits the short-term temporal cue for hand pose estimation.
Our approach achieves competitive results on two first-person hand action benchmarks, namely FPHA and H2O.
arXiv Detail & Related papers (2022-09-20T05:52:54Z) - Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based
Motion Recognition [62.46544616232238]
Previous motion recognition methods have achieved promising performance through the tightly coupled multi-temporal representation.
We propose to decouple and recouple caused caused representation for RGB-D-based motion recognition.
arXiv Detail & Related papers (2021-12-16T18:59:47Z) - RobustFusion: Robust Volumetric Performance Reconstruction under
Human-object Interactions from Monocular RGBD Stream [27.600873320989276]
High-quality 4D reconstruction of human performance with complex interactions to various objects is essential in real-world scenarios.
Recent advances still fail to provide reliable performance reconstruction.
We propose RobustFusion, a robust volumetric performance reconstruction system for human-object interaction scenarios.
arXiv Detail & Related papers (2021-04-30T08:41:45Z) - SeqHAND:RGB-Sequence-Based 3D Hand Pose and Shape Estimation [48.456638103309544]
3D hand pose estimation based on RGB images has been studied for a long time.
We propose a novel method that generates a synthetic dataset that mimics natural human hand movements.
We show that utilizing temporal information for 3D hand pose estimation significantly enhances general pose estimations.
arXiv Detail & Related papers (2020-07-10T05:11:14Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.