PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
- URL: http://arxiv.org/abs/2404.04430v1
- Date: Fri, 5 Apr 2024 22:07:25 GMT
- Title: PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
- Authors: Yufei Zhang, Jeffrey O. Kephart, Zijun Cui, Qiang Ji,
- Abstract summary: We introduce Physics-aware Pretrained Transformer (PhysPT), which improves kinematics-based motion estimates and infers motion forces.
PhysPT exploits a Transformer encoder-decoder backbone to effectively learn human dynamics in a self-supervised manner.
- Score: 29.784542628690794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While current methods have shown promising progress on estimating 3D human motion from monocular videos, their motion estimates are often physically unrealistic because they mainly consider kinematics. In this paper, we introduce Physics-aware Pretrained Transformer (PhysPT), which improves kinematics-based motion estimates and infers motion forces. PhysPT exploits a Transformer encoder-decoder backbone to effectively learn human dynamics in a self-supervised manner. Moreover, it incorporates physics principles governing human motion. Specifically, we build a physics-based body representation and contact force model. We leverage them to impose novel physics-inspired training losses (i.e., force loss, contact loss, and Euler-Lagrange loss), enabling PhysPT to capture physical properties of the human body and the forces it experiences. Experiments demonstrate that, once trained, PhysPT can be directly applied to kinematics-based estimates to significantly enhance their physical plausibility and generate favourable motion forces. Furthermore, we show that these physically meaningful quantities translate into improved accuracy of an important downstream task: human action recognition.
Related papers
- DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors [77.34056839349076]
We propose DreamPhysics, which estimates physical properties of 3D Gaussian Splatting with video diffusion priors.
Based on a material point method simulator with proper physical parameters, our method can generate 4D content with realistic motions.
arXiv Detail & Related papers (2024-06-03T16:05:25Z) - Physics-Guided Human Motion Capture with Pose Probability Modeling [35.159506668475565]
Existing solutions always adopt kinematic results as reference motions, and the physics is treated as a post-processing module.
We employ physics as denoising guidance in the reverse diffusion process to reconstruct human motion from a modeled pose probability distribution.
With several iterations, the physics-based tracking and kinematic denoising promote each other to generate a physically plausible human motion.
arXiv Detail & Related papers (2023-08-19T05:28:03Z) - D&D: Learning Human Dynamics from Dynamic Camera [55.60512353465175]
We present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the laws of physics to reconstruct 3D human motion from the in-the-wild videos with a moving camera.
Our approach is entirely neural-based and runs without offline optimization or simulation in physics engines.
arXiv Detail & Related papers (2022-09-19T06:51:02Z) - Trajectory Optimization for Physics-Based Reconstruction of 3d Human
Pose from Monocular Video [31.96672354594643]
We focus on the task of estimating a physically plausible articulated human motion from monocular video.
Existing approaches that do not consider physics often produce temporally inconsistent output with motion artifacts.
We show that our approach achieves competitive results with respect to existing physics-based methods on the Human3.6M benchmark.
arXiv Detail & Related papers (2022-05-24T18:02:49Z) - Dynamic Visual Reasoning by Learning Differentiable Physics Models from
Video and Language [92.7638697243969]
We propose a unified framework that can jointly learn visual concepts and infer physics models of objects from videos and language.
This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
arXiv Detail & Related papers (2021-10-28T17:59:13Z) - PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time [89.68248627276955]
Marker-less 3D motion capture from a single colour camera has seen significant progress.
However, it is a very challenging and severely ill-posed problem.
We present PhysCap, the first algorithm for physically plausible, real-time and marker-less human 3D motion capture.
arXiv Detail & Related papers (2020-08-20T10:46:32Z) - Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors.
We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z) - Use the Force, Luke! Learning to Predict Physical Forces by Simulating
Effects [79.351446087227]
We address the problem of inferring contact points and the physical forces from videos of humans interacting with objects.
Specifically, we use a simulator to predict effects and enforce that estimated forces must lead to the same effect as depicted in the video.
arXiv Detail & Related papers (2020-03-26T17:20:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.