FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
- URL: http://arxiv.org/abs/2505.13437v1
- Date: Mon, 19 May 2025 17:58:11 GMT
- Title: FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
- Authors: Dian Shao, Mingfei Shi, Shengda Xu, Haodong Chen, Yongle Huang, Binglu Wang,
- Abstract summary: FinePhys is a Fine-grained human action generation framework that incorporates Physics to obtain effective skeletal guidance.<n>FinePhys first estimates 2D poses in an online manner and then performs 2D-to-3D lifting via dimension in-context learning.<n>To mitigate the instability and limited interpretability of purely data-driven 3D poses, we introduce a physics-based motion re-estimation module.
- Score: 6.0892117964531955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite significant advances in video generation, synthesizing physically plausible human actions remains a persistent challenge, particularly in modeling fine-grained semantics and complex temporal dynamics. For instance, generating gymnastics routines such as "switch leap with 0.5 turn" poses substantial difficulties for current methods, often yielding unsatisfactory results. To bridge this gap, we propose FinePhys, a Fine-grained human action generation framework that incorporates Physics to obtain effective skeletal guidance. Specifically, FinePhys first estimates 2D poses in an online manner and then performs 2D-to-3D dimension lifting via in-context learning. To mitigate the instability and limited interpretability of purely data-driven 3D poses, we further introduce a physics-based motion re-estimation module governed by Euler-Lagrange equations, calculating joint accelerations via bidirectional temporal updating. The physically predicted 3D poses are then fused with data-driven ones, offering multi-scale 2D heatmap guidance for the diffusion process. Evaluated on three fine-grained action subsets from FineGym (FX-JUMP, FX-TURN, and FX-SALTO), FinePhys significantly outperforms competitive baselines. Comprehensive qualitative results further demonstrate FinePhys's ability to generate more natural and plausible fine-grained human actions.
Related papers
- Half-Physics: Enabling Kinematic 3D Human Model with Physical Interactions [88.01918532202716]
We introduce a novel approach that embeds SMPL-X into a tangible entity capable of dynamic physical interactions with its surroundings.<n>Our approach maintains kinematic control over inherent SMPL-X poses while ensuring physically plausible interactions with scenes and objects.<n>Unlike reinforcement learning-based methods, which demand extensive and complex training, our half-physics method is learning-free and generalizes to any body shape and motion.
arXiv Detail & Related papers (2025-07-31T17:58:33Z) - PhysX-3D: Physical-Grounded 3D Asset Generation [48.78065667043986]
Existing 3D generation primarily emphasizes geometries and textures while neglecting physical-grounded modeling.<n>We present PhysXNet - the first physics-grounded 3D dataset systematically annotated across five foundational dimensions.<n>We also propose textbfPhysXGen, a feed-forward framework for physics-grounded image-to-3D asset generation.
arXiv Detail & Related papers (2025-07-16T17:59:35Z) - PhysiInter: Integrating Physical Mapping for High-Fidelity Human Interaction Generation [35.563978243352764]
We introduce physical mapping, integrated throughout the human interaction generation pipeline.<n>Specifically, motion imitation within a physics-based simulation environment is used to project target motions into a physically valid space.<n>Experiments show our method achieves impressive results in generated human motion quality, with a 3%-89% improvement in physical fidelity.
arXiv Detail & Related papers (2025-06-09T06:04:49Z) - Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos [6.093379844890164]
We propose a novel method to selectively incorporate the physics models with the kinematics observations in an online setting.<n>A recurrent neural network is introduced to realize a Kalman filter that attentively balances the kinematics input and simulated motion.<n>The proposed approach excels in the physics-based human pose estimation task and demonstrates the physical plausibility of the predictive dynamics.
arXiv Detail & Related papers (2024-10-10T10:24:59Z) - DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors [75.83647027123119]
We propose to learn the physical properties of a material field with video diffusion priors.<n>We then utilize a physics-based Material-Point-Method simulator to generate 4D content with realistic motions.
arXiv Detail & Related papers (2024-06-03T16:05:25Z) - PhyRecon: Physically Plausible Neural Scene Reconstruction [81.73129450090684]
We introduce PHYRECON, the first approach to leverage both differentiable rendering and differentiable physics simulation to learn implicit surface representations.
Central to this design is an efficient transformation between SDF-based implicit representations and explicit surface points.
Our results also exhibit superior physical stability in physical simulators, with at least a 40% improvement across all datasets.
arXiv Detail & Related papers (2024-04-25T15:06:58Z) - MultiPhys: Multi-Person Physics-aware 3D Motion Estimation [28.91813849219037]
We introduce MultiPhys, a method designed for recovering multi-person motion from monocular videos.
Our focus lies in capturing coherent spatial placement between pairs of individuals across varying degrees of engagement.
We devise a pipeline in which the motion estimated by a kinematic-based method is fed into a physics simulator in an autoregressive manner.
arXiv Detail & Related papers (2024-04-18T08:29:29Z) - Building Flexible Machine Learning Models for Scientific Computing at Scale [35.41293100957156]
We present OmniArch, the first prototype aiming at solving multi-scale and multi-physics scientific computing problems with physical alignment.
As far as we know, we first conduct 1D-2D-3D united pre-training on the PDEBench, and it sets not only new performance benchmarks for 1D, 2D, and 3D PDEs but also demonstrates exceptional adaptability to new physics via in-context and zero-shot learning approaches.
arXiv Detail & Related papers (2024-02-25T07:19:01Z) - Skeleton2Humanoid: Animating Simulated Characters for
Physically-plausible Motion In-betweening [59.88594294676711]
Modern deep learning based motion synthesis approaches barely consider the physical plausibility of synthesized motions.
We propose a system Skeleton2Humanoid'' which performs physics-oriented motion correction at test time.
Experiments on the challenging LaFAN1 dataset show our system can outperform prior methods significantly in terms of both physical plausibility and accuracy.
arXiv Detail & Related papers (2022-10-09T16:15:34Z) - Differentiable Dynamics for Articulated 3d Human Motion Reconstruction [29.683633237503116]
We introduce DiffPhy, a differentiable physics-based model for articulated 3d human motion reconstruction from video.
We validate the model by demonstrating that it can accurately reconstruct physically plausible 3d human motion from monocular video.
arXiv Detail & Related papers (2022-05-24T17:58:37Z) - Neural Monocular 3D Human Motion Capture with Physical Awareness [76.55971509794598]
We present a new trainable system for physically plausible markerless 3D human motion capture.
Unlike most neural methods for human motion capture, our approach is aware of physical and environmental constraints.
It produces smooth and physically principled 3D motions in an interactive frame rate in a wide variety of challenging scenes.
arXiv Detail & Related papers (2021-05-03T17:57:07Z) - Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors.
We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z) - Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation.
This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.