Related papers: PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos

PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos

URL: http://arxiv.org/abs/2404.04430v1
Date: Fri, 5 Apr 2024 22:07:25 GMT
Title: PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
Authors: Yufei Zhang, Jeffrey O. Kephart, Zijun Cui, Qiang Ji,
Abstract summary: We introduce Physics-aware Pretrained Transformer (PhysPT), which improves kinematics-based motion estimates and infers motion forces. PhysPT exploits a Transformer encoder-decoder backbone to effectively learn human dynamics in a self-supervised manner.
Score: 29.784542628690794
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While current methods have shown promising progress on estimating 3D human motion from monocular videos, their motion estimates are often physically unrealistic because they mainly consider kinematics. In this paper, we introduce Physics-aware Pretrained Transformer (PhysPT), which improves kinematics-based motion estimates and infers motion forces. PhysPT exploits a Transformer encoder-decoder backbone to effectively learn human dynamics in a self-supervised manner. Moreover, it incorporates physics principles governing human motion. Specifically, we build a physics-based body representation and contact force model. We leverage them to impose novel physics-inspired training losses (i.e., force loss, contact loss, and Euler-Lagrange loss), enabling PhysPT to capture physical properties of the human body and the forces it experiences. Experiments demonstrate that, once trained, PhysPT can be directly applied to kinematics-based estimates to significantly enhance their physical plausibility and generate favourable motion forces. Furthermore, we show that these physically meaningful quantities translate into improved accuracy of an important downstream task: human action recognition.

Related papers

Half-Physics: Enabling Kinematic 3D Human Model with Physical Interactions [88.01918532202716]
We introduce a novel approach that embeds SMPL-X into a tangible entity capable of dynamic physical interactions with its surroundings.<n>Our approach maintains kinematic control over inherent SMPL-X poses while ensuring physically plausible interactions with scenes and objects.<n>Unlike reinforcement learning-based methods, which demand extensive and complex training, our half-physics method is learning-free and generalizes to any body shape and motion.
arXiv Detail & Related papers (2025-07-31T17:58:33Z)
Physics-informed Ground Reaction Dynamics from Human Motion Capture [4.4795626402834055]
We propose a novel method for estimating human ground reaction dynamics directly from motion capture data.<n>We introduce a highly accurate and robust method for computing ground reaction forces from motion capture data using Euler's integration scheme and PD algorithm.<n>The proposed approach was tested on the GroundLink dataset.
arXiv Detail & Related papers (2025-07-02T04:02:16Z)
VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior [88.51778468222766]
Video diffusion models (VDMs) have advanced significantly in recent years, enabling the generation of highly realistic videos. VDMs often fail to produce physically plausible videos due to an inherent lack of understanding of physics. We propose a novel two-stage image-to-video generation framework that explicitly incorporates physics with vision and language informed physical prior.
arXiv Detail & Related papers (2025-03-30T09:03:09Z)
Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos [6.093379844890164]
We propose a novel method to selectively incorporate the physics models with the kinematics observations in an online setting. A recurrent neural network is introduced to realize a Kalman filter that attentively balances the kinematics input and simulated motion. The proposed approach excels in the physics-based human pose estimation task and demonstrates the physical plausibility of the predictive dynamics.
arXiv Detail & Related papers (2024-10-10T10:24:59Z)
Physics-Guided Human Motion Capture with Pose Probability Modeling [35.159506668475565]
Existing solutions always adopt kinematic results as reference motions, and the physics is treated as a post-processing module. We employ physics as denoising guidance in the reverse diffusion process to reconstruct human motion from a modeled pose probability distribution. With several iterations, the physics-based tracking and kinematic denoising promote each other to generate a physically plausible human motion.
arXiv Detail & Related papers (2023-08-19T05:28:03Z)
Skeleton2Humanoid: Animating Simulated Characters for Physically-plausible Motion In-betweening [59.88594294676711]
Modern deep learning based motion synthesis approaches barely consider the physical plausibility of synthesized motions. We propose a system Skeleton2Humanoid'' which performs physics-oriented motion correction at test time. Experiments on the challenging LaFAN1 dataset show our system can outperform prior methods significantly in terms of both physical plausibility and accuracy.
arXiv Detail & Related papers (2022-10-09T16:15:34Z)
D&D: Learning Human Dynamics from Dynamic Camera [55.60512353465175]
We present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the laws of physics to reconstruct 3D human motion from the in-the-wild videos with a moving camera. Our approach is entirely neural-based and runs without offline optimization or simulation in physics engines.
arXiv Detail & Related papers (2022-09-19T06:51:02Z)
Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video [31.96672354594643]
We focus on the task of estimating a physically plausible articulated human motion from monocular video. Existing approaches that do not consider physics often produce temporally inconsistent output with motion artifacts. We show that our approach achieves competitive results with respect to existing physics-based methods on the Human3.6M benchmark.
arXiv Detail & Related papers (2022-05-24T18:02:49Z)
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language [92.7638697243969]
We propose a unified framework that can jointly learn visual concepts and infer physics models of objects from videos and language. This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
arXiv Detail & Related papers (2021-10-28T17:59:13Z)
PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time [89.68248627276955]
Marker-less 3D motion capture from a single colour camera has seen significant progress. However, it is a very challenging and severely ill-posed problem. We present PhysCap, the first algorithm for physically plausible, real-time and marker-less human 3D motion capture.
arXiv Detail & Related papers (2020-08-20T10:46:32Z)
Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors. We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z)
Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects [79.351446087227]
We address the problem of inferring contact points and the physical forces from videos of humans interacting with objects. Specifically, we use a simulator to predict effects and enforce that estimated forces must lead to the same effect as depicted in the video.
arXiv Detail & Related papers (2020-03-26T17:20:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.