Related papers: Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos

Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos

URL: http://arxiv.org/abs/2410.07795v2
Date: Mon, 28 Oct 2024 09:36:25 GMT
Title: Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos
Authors: Cuong Le, Viktor Johansson, Manon Kok, Bastian Wandt,
Abstract summary: We propose a novel method to selectively incorporate the physics models with the kinematics observations in an online setting. A recurrent neural network is introduced to realize a Kalman filter that attentively balances the kinematics input and simulated motion. The proposed approach excels in the physics-based human pose estimation task and demonstrates the physical plausibility of the predictive dynamics.
Score: 6.093379844890164
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Human motion capture from monocular videos has made significant progress in recent years. However, modern approaches often produce temporal artifacts, e.g. in form of jittery motion and struggle to achieve smooth and physically plausible motions. Explicitly integrating physics, in form of internal forces and exterior torques, helps alleviating these artifacts. Current state-of-the-art approaches make use of an automatic PD controller to predict torques and reaction forces in order to re-simulate the input kinematics, i.e. the joint angles of a predefined skeleton. However, due to imperfect physical models, these methods often require simplifying assumptions and extensive preprocessing of the input kinematics to achieve good performance. To this end, we propose a novel method to selectively incorporate the physics models with the kinematics observations in an online setting, inspired by a neural Kalman-filtering approach. We develop a control loop as a meta-PD controller to predict internal joint torques and external reaction forces, followed by a physics-based motion simulation. A recurrent neural network is introduced to realize a Kalman filter that attentively balances the kinematics input and simulated motion, resulting in an optimal-state dynamics prediction. We show that this filtering step is crucial to provide an online supervision that helps balancing the shortcoming of the respective input motions, thus being important for not only capturing accurate global motion trajectories but also producing physically plausible human poses. The proposed approach excels in the physics-based human pose estimation task and demonstrates the physical plausibility of the predictive dynamics, compared to state of the art. The code is available on https://github.com/cuongle1206/OSDCap

Related papers

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior [88.51778468222766]
Video diffusion models (VDMs) have advanced significantly in recent years, enabling the generation of highly realistic videos. VDMs often fail to produce physically plausible videos due to an inherent lack of understanding of physics. We propose a novel two-stage image-to-video generation framework that explicitly incorporates physics with vision and language informed physical prior.
arXiv Detail & Related papers (2025-03-30T09:03:09Z)
Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation [88.83749146867665]
Existing approaches learn a policy to predict a distant next-best end-effector pose. They then compute the corresponding joint rotation angles for motion using inverse kinematics. We propose Kinematics enhanced Spatial-TemporAl gRaph diffuser.
arXiv Detail & Related papers (2025-03-13T17:48:35Z)
A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions [56.709280823844374]
We introduce a mask-based motion correction module (MCM) that leverages motion context and video mask to repair flawed motions. We also propose a physics-based motion transfer module (PTM), which employs a pretrain and adapt approach for motion imitation. Our approach is designed as a plug-and-play module to physically refine the video motion capture results, including high-difficulty in-the-wild motions.
arXiv Detail & Related papers (2024-12-23T08:26:00Z)
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models [50.38647583839384]
We propose InterDyn, a framework that generates videos of interactive dynamics given an initial frame and a control signal encoding the motion of a driving object or actor. Our key insight is that large video generation models can act as both neurals and implicit physics simulators'', having learned interactive dynamics from large-scale video data.
arXiv Detail & Related papers (2024-12-16T13:57:02Z)
Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems [49.11170948406405]
State-of-the-art in automatic parameter estimation from video is addressed by training supervised deep networks on large datasets. We propose a method to estimate the physical parameters of any known, continuous governing equation from single videos.
arXiv Detail & Related papers (2024-10-02T09:44:54Z)
Physics-Guided Human Motion Capture with Pose Probability Modeling [35.159506668475565]
Existing solutions always adopt kinematic results as reference motions, and the physics is treated as a post-processing module. We employ physics as denoising guidance in the reverse diffusion process to reconstruct human motion from a modeled pose probability distribution. With several iterations, the physics-based tracking and kinematic denoising promote each other to generate a physically plausible human motion.
arXiv Detail & Related papers (2023-08-19T05:28:03Z)
Skeleton2Humanoid: Animating Simulated Characters for Physically-plausible Motion In-betweening [59.88594294676711]
Modern deep learning based motion synthesis approaches barely consider the physical plausibility of synthesized motions. We propose a system Skeleton2Humanoid'' which performs physics-oriented motion correction at test time. Experiments on the challenging LaFAN1 dataset show our system can outperform prior methods significantly in terms of both physical plausibility and accuracy.
arXiv Detail & Related papers (2022-10-09T16:15:34Z)
D&D: Learning Human Dynamics from Dynamic Camera [55.60512353465175]
We present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the laws of physics to reconstruct 3D human motion from the in-the-wild videos with a moving camera. Our approach is entirely neural-based and runs without offline optimization or simulation in physics engines.
arXiv Detail & Related papers (2022-09-19T06:51:02Z)
Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video [31.96672354594643]
We focus on the task of estimating a physically plausible articulated human motion from monocular video. Existing approaches that do not consider physics often produce temporally inconsistent output with motion artifacts. We show that our approach achieves competitive results with respect to existing physics-based methods on the Human3.6M benchmark.
arXiv Detail & Related papers (2022-05-24T18:02:49Z)
Differentiable Dynamics for Articulated 3d Human Motion Reconstruction [29.683633237503116]
We introduce DiffPhy, a differentiable physics-based model for articulated 3d human motion reconstruction from video. We validate the model by demonstrating that it can accurately reconstruct physically plausible 3d human motion from monocular video.
arXiv Detail & Related papers (2022-05-24T17:58:37Z)
Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture [12.631678059354593]
We exploit the high-precision and non-differentiable physics simulator to incorporate dynamical constraints in motion capture. Our key-idea is to use real physical supervisions to train a target pose distribution prior for sampling-based motion control. Results show that we can obtain physically plausible human motion with complex terrain interactions, human shape variations, and diverse behaviors.
arXiv Detail & Related papers (2022-03-26T12:48:41Z)
Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task. We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction. Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z)
Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction. Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z)
Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors. We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.