PhysDiff: Physics-Guided Human Motion Diffusion Model
- URL: http://arxiv.org/abs/2212.02500v3
- Date: Fri, 18 Aug 2023 19:59:48 GMT
- Title: PhysDiff: Physics-Guided Human Motion Diffusion Model
- Authors: Ye Yuan, Jiaming Song, Umar Iqbal, Arash Vahdat, Jan Kautz
- Abstract summary: Existing motion diffusion models largely disregard the laws of physics in the diffusion process.
PhysDiff incorporates physical constraints into the diffusion process.
Our approach achieves state-of-the-art motion quality and improves physical plausibility drastically.
- Score: 101.1823574561535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Denoising diffusion models hold great promise for generating diverse and
realistic human motions. However, existing motion diffusion models largely
disregard the laws of physics in the diffusion process and often generate
physically-implausible motions with pronounced artifacts such as floating, foot
sliding, and ground penetration. This seriously impacts the quality of
generated motions and limits their real-world application. To address this
issue, we present a novel physics-guided motion diffusion model (PhysDiff),
which incorporates physical constraints into the diffusion process.
Specifically, we propose a physics-based motion projection module that uses
motion imitation in a physics simulator to project the denoised motion of a
diffusion step to a physically-plausible motion. The projected motion is
further used in the next diffusion step to guide the denoising diffusion
process. Intuitively, the use of physics in our model iteratively pulls the
motion toward a physically-plausible space, which cannot be achieved by simple
post-processing. Experiments on large-scale human motion datasets show that our
approach achieves state-of-the-art motion quality and improves physical
plausibility drastically (>78% for all datasets).
Related papers
- Morph: A Motion-free Physics Optimization Framework for Human Motion Generation [25.51726849102517]
Our framework achieves state-of-the-art motion generation quality while improving physical plausibility drastically.
Experiments on text-to-motion and music-to-dance generation tasks demonstrate that our framework achieves state-of-the-art motion generation quality.
arXiv Detail & Related papers (2024-11-22T14:09:56Z) - ReinDiffuse: Crafting Physically Plausible Motions with Reinforced Diffusion Model [9.525806425270428]
We present emphReinDiffuse that combines reinforcement learning with motion diffusion model to generate physically credible human motions.
Our method adapts Motion Diffusion Model to output a parameterized distribution of actions, making them compatible with reinforcement learning paradigms.
Our approach outperforms existing state-of-the-art models on two major datasets, HumanML3D and KIT-ML.
arXiv Detail & Related papers (2024-10-09T16:24:11Z) - Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer [55.109778609058154]
Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models.
We uncover the roles and interactions of attention elements in capturing and representing motion patterns.
We integrate these elements to transfer a leader motion to a follower one while maintaining the nuanced characteristics of the follower, resulting in zero-shot motion transfer.
arXiv Detail & Related papers (2024-06-10T17:47:14Z) - DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors [75.83647027123119]
We propose to learn the physical properties of a material field with video diffusion priors.
We then utilize a physics-based Material-Point-Method simulator to generate 4D content with realistic motions.
arXiv Detail & Related papers (2024-06-03T16:05:25Z) - MotionCraft: Physics-based Zero-Shot Video Generation [22.33113030344355]
MotionCraft is a new zero-shot video generator to craft physics-based and realistic videos.
We show that MotionCraft is able to warp the noise latent space of an image diffusion model, such as Stable Diffusion, by applying an optical flow.
We compare our method with the state-of-the-art Text2Video-Zero reporting qualitative and quantitative improvements.
arXiv Detail & Related papers (2024-05-22T11:44:57Z) - MotionMix: Weakly-Supervised Diffusion for Controllable Motion
Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences.
Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - Physics-Guided Human Motion Capture with Pose Probability Modeling [35.159506668475565]
Existing solutions always adopt kinematic results as reference motions, and the physics is treated as a post-processing module.
We employ physics as denoising guidance in the reverse diffusion process to reconstruct human motion from a modeled pose probability distribution.
With several iterations, the physics-based tracking and kinematic denoising promote each other to generate a physically plausible human motion.
arXiv Detail & Related papers (2023-08-19T05:28:03Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors.
We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.