Related papers: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion

Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion

URL: http://arxiv.org/abs/2305.12554v2
Date: Tue, 19 Dec 2023 23:52:51 GMT
Title: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion
Authors: Jiarui Sun, Girish Chowdhary
Abstract summary: We propose DiffMotion as an end-to-end diffusion-based Human Motion Prediction framework. Our results on benchmark datasets show that DiffMotion significantly outperforms previous methods in terms of both accuracy and fidelity.
Score: 8.10696589962658
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic Human Motion Prediction (HMP) aims to predict multiple possible upcoming pose sequences based on past human motion trajectories. Although previous approaches have shown impressive performance, they face several issues, including complex training processes and a tendency to generate predictions that are often inconsistent with the provided history, and sometimes even becoming entirely unreasonable. To overcome these issues, we propose DiffMotion, an end-to-end diffusion-based stochastic HMP framework. DiffMotion's motion predictor is composed of two modules, including (1) a Transformer-based network for initial motion reconstruction from corrupted motion, and (2) a Graph Convolutional Network (GCN) to refine the generated motion considering past observations. Our method, facilitated by this novel Transformer-GCN module design and a proposed variance scheduler, excels in predicting accurate, realistic, and consistent motions, while maintaining an appropriate level of diversity. Our results on benchmark datasets show that DiffMotion significantly outperforms previous methods in terms of both accuracy and fidelity, while demonstrating superior robustness.

Related papers

Overcoming Semantic Dilution in Transformer-Based Next Frame Prediction [0.9776703963093367]
Next-frame prediction in videos is crucial for applications such as autonomous driving, object tracking, and motion prediction. transformer-based next-frame prediction models face notable issues. We propose a Semantic Concentration Multi-Head Self-Attention architecture, which effectively mitigates semantic dilution.
arXiv Detail & Related papers (2025-01-28T07:12:29Z)
Physics-guided Active Sample Reweighting for Urban Flow Prediction [75.24539704456791]
Urban flow prediction is a nuanced-temporal modeling that estimates the throughput of transportation services like buses, taxis and ride-driven models. Some recent prediction solutions bring remedies with the notion of physics-guided machine learning (PGML) We develop a atized physics-guided network (PN), and propose a data-aware framework Physics-guided Active Sample Reweighting (P-GASR)
arXiv Detail & Related papers (2024-07-18T15:44:23Z)
AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving [59.94343412438211]
We introduce the GPT style next token motion prediction into motion prediction. Different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations. We propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations.
arXiv Detail & Related papers (2024-03-20T06:22:37Z)
TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction [1.8923948104852863]
We propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction. Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers. In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization, we treat all inputs, including conditions, as tokens to create a more lightweight model.
arXiv Detail & Related papers (2023-07-30T01:52:07Z)
Uncovering the Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction [60.60223171143206]
Trajectory prediction is a crucial undertaking in understanding entity movement or human behavior from observed sequences. Current methods often assume that the observed sequences are complete while ignoring the potential for missing values. This paper presents a unified framework, the Graph-based Conditional Variational Recurrent Neural Network (GC-VRNN), which can perform trajectory imputation and prediction simultaneously.
arXiv Detail & Related papers (2023-03-28T14:27:27Z)
An Energy-Based Prior for Generative Saliency [62.79775297611203]
We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution. With the generative saliency model, we can obtain a pixel-wise uncertainty map from an image, indicating model confidence in the saliency prediction. Experimental results show that our generative saliency model with an energy-based prior can achieve not only accurate saliency predictions but also reliable uncertainty maps consistent with human perception.
arXiv Detail & Related papers (2022-04-19T10:51:00Z)
Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion [88.45326906116165]
We present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID) We encode the history behavior information and the social interactions as a state embedding and devise a Transformer-based diffusion model to capture the temporal dependencies of trajectories. Experiments on the human trajectory prediction benchmarks including the Stanford Drone and ETH/UCY datasets demonstrate the superiority of our method.
arXiv Detail & Related papers (2022-03-25T16:59:08Z)
Learning to Predict Diverse Human Motions from a Single Image via Mixture Density Networks [9.06677862854201]
We propose a novel approach to predict future human motions from a single image, with mixture density networks (MDN) modeling. Contrary to most existing deep human motion prediction approaches, the multimodal nature of MDN enables the generation of diverse future motion hypotheses. Our trained model directly takes an image as input and generates multiple plausible motions that satisfy the given condition.
arXiv Detail & Related papers (2021-09-13T08:49:33Z)
Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction. Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z)
Multitask Non-Autoregressive Model for Human Motion Prediction [33.98939145212708]
Non-auToregressive Model (NAT) is proposed with a complete non-autoregressive decoding scheme, as well as a context encoder and a positional encoding module. Our approach is evaluated on Human3.6M and CMU-Mocap benchmarks and outperforms state-of-the-art autoregressive methods.
arXiv Detail & Related papers (2020-07-13T15:00:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.