Related papers: Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera

Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera

URL: http://arxiv.org/abs/2412.12861v2
Date: Wed, 18 Dec 2024 21:29:43 GMT
Title: Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera
Authors: Zhengdi Yu, Stefanos Zafeiriou, Tolga Birdal,
Abstract summary: Dyn-HaMR is the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild.<n>We show that our approach significantly outperforms state-of-the-art methods in terms of 4D global mesh recovery.<n>This establishes a new benchmark for hand motion reconstruction from monocular video with moving cameras.
Score: 49.82535393220003
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We propose Dyn-HaMR, to the best of our knowledge, the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. Reconstructing accurate 3D hand meshes from monocular videos is a crucial task for understanding human behaviour, with significant applications in augmented and virtual reality (AR/VR). However, existing methods for monocular hand reconstruction typically rely on a weak perspective camera model, which simulates hand motion within a limited camera frustum. As a result, these approaches struggle to recover the full 3D global trajectory and often produce noisy or incorrect depth estimations, particularly when the video is captured by dynamic or moving cameras, which is common in egocentric scenarios. Our Dyn-HaMR consists of a multi-stage, multi-objective optimization pipeline, that factors in (i) simultaneous localization and mapping (SLAM) to robustly estimate relative camera motion, (ii) an interacting-hand prior for generative infilling and to refine the interaction dynamics, ensuring plausible recovery under (self-)occlusions, and (iii) hierarchical initialization through a combination of state-of-the-art hand tracking methods. Through extensive evaluations on both in-the-wild and indoor datasets, we show that our approach significantly outperforms state-of-the-art methods in terms of 4D global mesh recovery. This establishes a new benchmark for hand motion reconstruction from monocular video with moving cameras. Our project page is at https://dyn-hamr.github.io/.

Related papers

Diffusion-based 3D Hand Motion Recovery with Intuitive Physics [29.784542628690794]
We present a novel 3D hand motion recovery framework that enhances image-based reconstructions.<n>Our model captures the distribution of refined motion estimates conditioned on initial ones, generating improved sequences.<n>We identify valuable intuitive physics knowledge during hand-object interactions, including key motion states and their associated motion constraints.
arXiv Detail & Related papers (2025-08-03T16:44:24Z)
Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction [78.27956235915622]
Traditional SLAM systems struggle with highly dynamic scenes commonly found in casual videos. This work leverages a 3D point tracker to separate the camera-induced motion from the observed motion of dynamic objects. Our framework combines the core of traditional SLAM -- bundle adjustment -- with a robust learning-based 3D tracker front-end.
arXiv Detail & Related papers (2025-04-20T07:29:42Z)
HumanMM: Global Human Motion Recovery from Multi-shot Videos [24.273414172013933]
We present a novel framework designed to reconstruct long-sequence 3D human motion in the world coordinates from in-the-wild videos with multiple shot transitions. Such long-sequence in-the-wild motions are highly valuable to applications such as motion generation and motion understanding.
arXiv Detail & Related papers (2025-03-10T17:57:03Z)
HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos [26.766489527823662]
HaWoR is a high-fidelity method for hand motion reconstruction in world coordinates from egocentric videos. To achieve precise camera trajectory estimation, we propose an adaptive egocentric SLAM framework. We demonstrate that HaWoR achieves state-of-the-art performance on both hand motion reconstruction and world-frame camera trajectory estimation.
arXiv Detail & Related papers (2025-01-06T12:29:33Z)
Motion Diffusion-Guided 3D Global HMR from a Dynamic Camera [3.6948631725065355]
We present DiffOpt, a novel 3D global HMR method using Diffusion Optimization. Our key insight is that recent advances in human motion generation, such as the motion diffusion model (MDM), contain a strong prior of coherent human motion. We validate DiffOpt with video sequences from the Electromagnetic Database of Global 3D Human Pose and Shape in the Wild.
arXiv Detail & Related papers (2024-11-15T21:09:40Z)
UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos [25.41337525728398]
We introduce UniHOI, a model that unifies the estimation of all variables necessary for dense 4D reconstruction. UniHOI is the first approach to offer fast, dense, and general monocular egocentric HOI scene reconstruction in the presence of motion.
arXiv Detail & Related papers (2024-11-14T02:57:11Z)
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z)
DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos. Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion. Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z)
HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions. Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios. We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z)
GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras [99.07219478953982]
We present an approach for 3D global human mesh recovery from monocular videos recorded with dynamic cameras. We first propose a deep generative motion infiller, which autoregressively infills the body motions of occluded humans based on visible motions. In contrast to prior work, our approach reconstructs human meshes in consistent global coordinates even with dynamic cameras.
arXiv Detail & Related papers (2021-12-02T18:59:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.