Related papers: CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion

CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion

URL: http://arxiv.org/abs/2305.12554v3
Date: Mon, 19 Aug 2024 16:54:21 GMT
Title: CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion
Authors: Jiarui Sun, Girish Chowdhary,
Abstract summary: We propose CoMusion, a single-stage, end-to-end diffusion-based HMP framework. CoMusion is inspired from the insight that a smooth future pose prediction performance improves spatial prediction performance. Our method, facilitated by the Transformer-GCN module design and a proposed variance scheduler, predicts accurate, realistic, and consistent motions.
Score: 6.862357145175449
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stochastic Human Motion Prediction (HMP) aims to predict multiple possible future human pose sequences from observed ones. Most prior works learn motion distributions through encoding-decoding in the latent space, which does not preserve motion's spatial-temporal structure. While effective, these methods often require complex, multi-stage training and yield predictions that are inconsistent with the provided history and can be physically unrealistic. To address these issues, we propose CoMusion, a single-stage, end-to-end diffusion-based stochastic HMP framework. CoMusion is inspired from the insight that a smooth future pose initialization improves prediction performance, a strategy not previously utilized in stochastic models but evidenced in deterministic works. To generate such initialization, CoMusion's motion predictor starts with a Transformer-based network for initial reconstruction of corrupted motion. Then, a graph convolutional network (GCN) is employed to refine the prediction considering past observations in the discrete cosine transformation (DCT) space. Our method, facilitated by the Transformer-GCN module design and a proposed variance scheduler, excels in predicting accurate, realistic, and consistent motions, while maintaining appropriate diversity. Experimental results on benchmark datasets demonstrate that CoMusion surpasses prior methods across metrics, while demonstrating superior generation quality. Our Code is released at https://github.com/jsun57/CoMusion/ .

Related papers

Overcoming Semantic Dilution in Transformer-Based Next Frame Prediction [0.9776703963093367]
Next-frame prediction in videos is crucial for applications such as autonomous driving, object tracking, and motion prediction. transformer-based next-frame prediction models face notable issues. We propose a Semantic Concentration Multi-Head Self-Attention architecture, which effectively mitigates semantic dilution.
arXiv Detail & Related papers (2025-01-28T07:12:29Z)
Physics-guided Active Sample Reweighting for Urban Flow Prediction [75.24539704456791]
Urban flow prediction is a nuanced-temporal modeling that estimates the throughput of transportation services like buses, taxis and ride-driven models. Some recent prediction solutions bring remedies with the notion of physics-guided machine learning (PGML) We develop a atized physics-guided network (PN), and propose a data-aware framework Physics-guided Active Sample Reweighting (P-GASR)
arXiv Detail & Related papers (2024-07-18T15:44:23Z)
AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving [59.94343412438211]
We introduce the GPT style next token motion prediction into motion prediction. Different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations. We propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations.
arXiv Detail & Related papers (2024-03-20T06:22:37Z)
TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction [1.8923948104852863]
We propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction. Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers. In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization, we treat all inputs, including conditions, as tokens to create a more lightweight model.
arXiv Detail & Related papers (2023-07-30T01:52:07Z)
Uncovering the Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction [60.60223171143206]
Trajectory prediction is a crucial undertaking in understanding entity movement or human behavior from observed sequences. Current methods often assume that the observed sequences are complete while ignoring the potential for missing values. This paper presents a unified framework, the Graph-based Conditional Variational Recurrent Neural Network (GC-VRNN), which can perform trajectory imputation and prediction simultaneously.
arXiv Detail & Related papers (2023-03-28T14:27:27Z)
An Energy-Based Prior for Generative Saliency [62.79775297611203]
We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution. With the generative saliency model, we can obtain a pixel-wise uncertainty map from an image, indicating model confidence in the saliency prediction. Experimental results show that our generative saliency model with an energy-based prior can achieve not only accurate saliency predictions but also reliable uncertainty maps consistent with human perception.
arXiv Detail & Related papers (2022-04-19T10:51:00Z)
Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion [88.45326906116165]
We present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID) We encode the history behavior information and the social interactions as a state embedding and devise a Transformer-based diffusion model to capture the temporal dependencies of trajectories. Experiments on the human trajectory prediction benchmarks including the Stanford Drone and ETH/UCY datasets demonstrate the superiority of our method.
arXiv Detail & Related papers (2022-03-25T16:59:08Z)
Learning to Predict Diverse Human Motions from a Single Image via Mixture Density Networks [9.06677862854201]
We propose a novel approach to predict future human motions from a single image, with mixture density networks (MDN) modeling. Contrary to most existing deep human motion prediction approaches, the multimodal nature of MDN enables the generation of diverse future motion hypotheses. Our trained model directly takes an image as input and generates multiple plausible motions that satisfy the given condition.
arXiv Detail & Related papers (2021-09-13T08:49:33Z)
Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction. Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z)
Multitask Non-Autoregressive Model for Human Motion Prediction [33.98939145212708]
Non-auToregressive Model (NAT) is proposed with a complete non-autoregressive decoding scheme, as well as a context encoder and a positional encoding module. Our approach is evaluated on Human3.6M and CMU-Mocap benchmarks and outperforms state-of-the-art autoregressive methods.
arXiv Detail & Related papers (2020-07-13T15:00:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.