Human Joint Kinematics Diffusion-Refinement for Stochastic Motion
Prediction
- URL: http://arxiv.org/abs/2210.05976v1
- Date: Wed, 12 Oct 2022 07:38:33 GMT
- Title: Human Joint Kinematics Diffusion-Refinement for Stochastic Motion
Prediction
- Authors: Dong Wei, Huaijiang Sun, Bin Li, Jianfeng Lu, Weiqing Li, Xiaoning
Sun, Shengxiang Hu
- Abstract summary: MotionDiff is a diffusion probabilistic model to treat the kinematics of human joints as heated particles.
MotionDiff consists of two parts: a spatial-temporal transformer-based diffusion network to generate diverse yet plausible motions, and a graph convolutional network to further refine the outputs.
- Score: 22.354538952573158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic human motion prediction aims to forecast multiple plausible future
motions given a single pose sequence from the past. Most previous works focus
on designing elaborate losses to improve the accuracy, while the diversity is
typically characterized by randomly sampling a set of latent variables from the
latent prior, which is then decoded into possible motions. This joint training
of sampling and decoding, however, suffers from posterior collapse as the
learned latent variables tend to be ignored by a strong decoder, leading to
limited diversity. Alternatively, inspired by the diffusion process in
nonequilibrium thermodynamics, we propose MotionDiff, a diffusion probabilistic
model to treat the kinematics of human joints as heated particles, which will
diffuse from original states to a noise distribution. This process offers a
natural way to obtain the "whitened" latents without any trainable parameters,
and human motion prediction can be regarded as the reverse diffusion process
that converts the noise distribution into realistic future motions conditioned
on the observed sequence. Specifically, MotionDiff consists of two parts: a
spatial-temporal transformer-based diffusion network to generate diverse yet
plausible motions, and a graph convolutional network to further refine the
outputs. Experimental results on two datasets demonstrate that our model yields
the competitive performance in terms of both accuracy and diversity.
Related papers
- Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion [61.03681839276652]
Diffusion Forcing is a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels.
We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens.
arXiv Detail & Related papers (2024-07-01T15:43:25Z) - TransFusion: A Practical and Effective Transformer-based Diffusion Model
for 3D Human Motion Prediction [1.8923948104852863]
We propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction.
Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers.
In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization, we treat all inputs, including conditions, as tokens to create a more lightweight model.
arXiv Detail & Related papers (2023-07-30T01:52:07Z) - R2-Diff: Denoising by diffusion as a refinement of retrieved motion for
image-based motion prediction [8.104557130048407]
In image-based motion prediction, diffusion models predict contextually appropriate motion by gradually denoising random noise based on the image context.
In R2-Diff, a motion retrieved from a dataset based on image similarity is fed into a diffusion model instead of random noise.
R2-Diff accurately predicts appropriate motions and achieves high task success rates compared to recent state-of-the-art models in robot manipulation.
arXiv Detail & Related papers (2023-06-15T20:27:06Z) - Uncertainty-Aware Pedestrian Trajectory Prediction via Distributional Diffusion [26.715578412088327]
We present a model-agnostic uncertainty-aware pedestrian trajectory prediction framework.
Unlike previous studies, we translate the predictiveity to explicit distributions, allowing it to generate plausible future trajectories.
Our framework is compatible with different neural net architectures.
arXiv Detail & Related papers (2023-03-15T04:58:43Z) - Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion
Probabilistic Models [58.357180353368896]
We propose a conditional paradigm that benefits from the denoising diffusion probabilistic model (DDPM) to tackle the problem of realistic and diverse action-conditioned 3D skeleton-based motion generation.
We are a pioneering attempt that uses DDPM to synthesize a variable number of motion sequences conditioned on a categorical action.
arXiv Detail & Related papers (2023-01-10T13:15:42Z) - Bi-Noising Diffusion: Towards Conditional Diffusion Models with
Generative Restoration Priors [64.24948495708337]
We introduce a new method that brings predicted samples to the training data manifold using a pretrained unconditional diffusion model.
We perform comprehensive experiments to demonstrate the effectiveness of our approach on super-resolution, colorization, turbulence removal, and image-deraining tasks.
arXiv Detail & Related papers (2022-12-14T17:26:35Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - Diffusion Models in Vision: A Survey [80.82832715884597]
A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage.
Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.
arXiv Detail & Related papers (2022-09-10T22:00:30Z) - Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion [88.45326906116165]
We present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID)
We encode the history behavior information and the social interactions as a state embedding and devise a Transformer-based diffusion model to capture the temporal dependencies of trajectories.
Experiments on the human trajectory prediction benchmarks including the Stanford Drone and ETH/UCY datasets demonstrate the superiority of our method.
arXiv Detail & Related papers (2022-03-25T16:59:08Z) - Learning to Predict Diverse Human Motions from a Single Image via
Mixture Density Networks [9.06677862854201]
We propose a novel approach to predict future human motions from a single image, with mixture density networks (MDN) modeling.
Contrary to most existing deep human motion prediction approaches, the multimodal nature of MDN enables the generation of diverse future motion hypotheses.
Our trained model directly takes an image as input and generates multiple plausible motions that satisfy the given condition.
arXiv Detail & Related papers (2021-09-13T08:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.