A Spatio-temporal Continuous Network for Stochastic 3D Human Motion Prediction
- URL: http://arxiv.org/abs/2508.01585v1
- Date: Sun, 03 Aug 2025 04:53:39 GMT
- Title: A Spatio-temporal Continuous Network for Stochastic 3D Human Motion Prediction
- Authors: Hua Yu, Yaqing Hou, Xu Gui, Shanshan Feng, Dongsheng Zhou, Qiang Zhang,
- Abstract summary: We propose a novel method called STC, for continuous human motion prediction, which consists of two stages.<n>In the first stage, we propose atemporal continuous network to generate smoother human motion sequences.<n>In the second stage, STCN endeavors to acquire the Gaussian mixture distribution (GMM) of observed motion sequences.
- Score: 15.033378809142299
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic Human Motion Prediction (HMP) has received increasing attention due to its wide applications. Despite the rapid progress in generative fields, existing methods often face challenges in learning continuous temporal dynamics and predicting stochastic motion sequences. They tend to overlook the flexibility inherent in complex human motions and are prone to mode collapse. To alleviate these issues, we propose a novel method called STCN, for stochastic and continuous human motion prediction, which consists of two stages. Specifically, in the first stage, we propose a spatio-temporal continuous network to generate smoother human motion sequences. In addition, the anchor set is innovatively introduced into the stochastic HMP task to prevent mode collapse, which refers to the potential human motion patterns. In the second stage, STCN endeavors to acquire the Gaussian mixture distribution (GMM) of observed motion sequences with the aid of the anchor set. It also focuses on the probability associated with each anchor, and employs the strategy of sampling multiple sequences from each anchor to alleviate intra-class differences in human motions. Experimental results on two widely-used datasets (Human3.6M and HumanEva-I) demonstrate that our model obtains competitive performance on both diversity and accuracy.
Related papers
- ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model [52.02220087880269]
We propose an extension of ManiGaussian framework that improves bimanual manipulation by digesting multi-task scene dynamics through a hierarchical world model.<n>Our method significantly outperforms the current state-of-the-art bimanual manipulation techniques by an improvement of 20.2% in 10 simulated tasks, and 60% success rate on average in 9 challenging real-world tasks.
arXiv Detail & Related papers (2025-06-24T17:59:06Z) - GENMO: A GENeralist Model for Human MOtion [64.16188966024542]
We present GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework.<n>Our key insight is to reformulate motion estimation as constrained motion generation, where the output motion must precisely satisfy observed conditioning signals.<n>Our novel architecture handles variable-length motions and mixed multimodal conditions (text, audio, video) at different time intervals, offering flexible control.
arXiv Detail & Related papers (2025-05-02T17:59:55Z) - Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis [31.082402451716973]
Human motion synthesis aims to generate plausible human motion sequences.<n>Recent score-based generative models (SGMs) have demonstrated impressive results on this task.<n>We propose a Deterministic-to-Stochastic Diverse Latent Feature Mapping (DSDFM) method for human motion synthesis.
arXiv Detail & Related papers (2025-05-02T04:48:28Z) - DivDiff: A Conditional Diffusion Model for Diverse Human Motion Prediction [9.447439259813112]
We propose a conditional diffusion-based generative model, called DivDiff, to predict more diverse and realistic human motions.
Specifically, the DivDiff employs DDPM as our backbone and incorporates Discrete Cosine Transform (DCT) and transformer mechanisms.
We design a diversified reinforcement sampling function (DRSF) to enforce human skeletal constraints on the predicted human motions.
arXiv Detail & Related papers (2024-08-16T04:51:32Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - Diverse Human Motion Prediction Guided by Multi-Level Spatial-Temporal
Anchors [21.915057426589744]
We propose a simple yet effective approach that disentangles randomly sampled codes with a deterministic learnable component named anchors to promote sample precision and diversity.
In principle, our spatial-temporal anchor-based sampling (STARS) can be applied to different motion predictors.
arXiv Detail & Related papers (2023-02-09T18:58:07Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction [26.306489700180627]
We present BeLFusion, a model that leverages latent diffusion models in human motion prediction (HMP) to sample from a latent space where behavior is disentangled from pose and motion.
Thanks to our behavior coupler's ability to transfer sampled behavior to ongoing motion, BeLFusion's predictions display a variety of behaviors that are significantly more realistic than the state of the art.
arXiv Detail & Related papers (2022-11-25T18:59:03Z) - Weakly-supervised Action Transition Learning for Stochastic Human Motion
Prediction [81.94175022575966]
We introduce the task of action-driven human motion prediction.
It aims to predict multiple plausible future motions given a sequence of action labels and a short motion history.
arXiv Detail & Related papers (2022-05-31T08:38:07Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - Learning to Predict Diverse Human Motions from a Single Image via
Mixture Density Networks [9.06677862854201]
We propose a novel approach to predict future human motions from a single image, with mixture density networks (MDN) modeling.
Contrary to most existing deep human motion prediction approaches, the multimodal nature of MDN enables the generation of diverse future motion hypotheses.
Our trained model directly takes an image as input and generates multiple plausible motions that satisfy the given condition.
arXiv Detail & Related papers (2021-09-13T08:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.