Diffusion-based 3D Hand Motion Recovery with Intuitive Physics
- URL: http://arxiv.org/abs/2508.01835v1
- Date: Sun, 03 Aug 2025 16:44:24 GMT
- Title: Diffusion-based 3D Hand Motion Recovery with Intuitive Physics
- Authors: Yufei Zhang, Zijun Cui, Jeffrey O. Kephart, Qiang Ji,
- Abstract summary: We present a novel 3D hand motion recovery framework that enhances image-based reconstructions.<n>Our model captures the distribution of refined motion estimates conditioned on initial ones, generating improved sequences.<n>We identify valuable intuitive physics knowledge during hand-object interactions, including key motion states and their associated motion constraints.
- Score: 29.784542628690794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While 3D hand reconstruction from monocular images has made significant progress, generating accurate and temporally coherent motion estimates from videos remains challenging, particularly during hand-object interactions. In this paper, we present a novel 3D hand motion recovery framework that enhances image-based reconstructions through a diffusion-based and physics-augmented motion refinement model. Our model captures the distribution of refined motion estimates conditioned on initial ones, generating improved sequences through an iterative denoising process. Instead of relying on scarce annotated video data, we train our model only using motion capture data without images. We identify valuable intuitive physics knowledge during hand-object interactions, including key motion states and their associated motion constraints. We effectively integrate these physical insights into our diffusion model to improve its performance. Extensive experiments demonstrate that our approach significantly improves various frame-wise reconstruction methods, achieving state-of-the-art (SOTA) performance on existing benchmarks.
Related papers
- SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios [48.09735396455107]
Hand-Object Interaction (HOI) generation has significant application potential.<n>Current 3D HOI motion generation approaches heavily rely on predefined 3D object models and lab-captured motion data.<n>We propose a novel framework that combines visual priors and dynamic constraints within a synchronized diffusion process to generate the HOI video and motion simultaneously.
arXiv Detail & Related papers (2025-06-03T05:04:29Z) - MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM [14.522189177415724]
MAGIC is a training-free framework for single-image physical property inference and dynamic generation.<n>Our framework generates motion-rich videos from a static image and closes the visual-to-physical gap through a confidence-driven feedback loop.<n> Experiments show that MAGIC outperforms existing physics-aware generative methods in inference accuracy and achieves greater temporal coherence.
arXiv Detail & Related papers (2025-05-22T09:40:34Z) - REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning [95.07708090428814]
We present REWIND, a one-step diffusion model for real-time, high-fidelity human motion estimation from egocentric image inputs.<n>We introduce cascaded body-hand denoising diffusion, which effectively models the correlation between egocentric body and hand motions.<n>We also propose a novel identity conditioning method based on a small set of pose exemplars of the target identity, which further enhances motion estimation quality.
arXiv Detail & Related papers (2025-04-07T11:44:11Z) - CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images [19.08403715388913]
3D Gaussian Splatting has gained significant attention due to its high-quality novel view rendering.<n>A critical issue is the camera motion blur caused by movement during exposure, which hinders accurate 3D scene reconstruction.<n>We propose CoMoGaussian, a Continuous Motion-Aware Gaussian Splatting that reconstructs precise 3D scenes from motion-blurred images.
arXiv Detail & Related papers (2025-03-07T11:18:43Z) - Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera [49.82535393220003]
Dyn-HaMR is the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild.<n>We show that our approach significantly outperforms state-of-the-art methods in terms of 4D global mesh recovery.<n>This establishes a new benchmark for hand motion reconstruction from monocular video with moving cameras.
arXiv Detail & Related papers (2024-12-17T12:43:10Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction [89.53963284958037]
We propose a novel motion-aware enhancement framework for dynamic scene reconstruction.
Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow.
For the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed.
arXiv Detail & Related papers (2024-03-18T03:46:26Z) - Motion-DVAE: Unsupervised learning for fast human motion denoising [18.432026846779372]
We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion.
Together with Motion-DVAE, we introduce an unsupervised learned denoising method unifying regression- and optimization-based approaches.
arXiv Detail & Related papers (2023-06-09T12:18:48Z) - HuMoR: 3D Human Motion Model for Robust Pose Estimation [100.55369985297797]
HuMoR is a 3D Human Motion Model for Robust Estimation of temporal pose and shape.
We introduce a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence.
We demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset.
arXiv Detail & Related papers (2021-05-10T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.