Related papers: SMamDiff: Spatial Mamba for Stochastic Human Motion Prediction

SMamDiff: Spatial Mamba for Stochastic Human Motion Prediction

URL: http://arxiv.org/abs/2512.00355v1
Date: Sat, 29 Nov 2025 06:49:38 GMT
Title: SMamDiff: Spatial Mamba for Stochastic Human Motion Prediction
Authors: Junqiao Fan, Pengfei Liu, Haocong Rao,
Abstract summary: This work focuses on how to ensure spatial-temporal coherence within a single-stage diffusion model for human motion prediction (HMP)<n>On Human3.6M and HumanEva, these coherence mechanisms deliver state-of-the-art results while using less latency and memory than multi-stage diffusion baselines.
Score: 26.646112368625207
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With intelligent room-side sensing and service robots widely deployed, human motion prediction (HMP) is essential for safe, proactive assistance. However, many existing HMP methods either produce a single, deterministic forecast that ignores uncertainty or rely on probabilistic models that sacrifice kinematic plausibility. Diffusion models improve the accuracy-diversity trade-off but often depend on multi-stage pipelines that are costly for edge deployment. This work focuses on how to ensure spatial-temporal coherence within a single-stage diffusion model for HMP. We introduce SMamDiff, a Spatial Mamba-based Diffusion model with two novel designs: (i) a residual-DCT motion encoding that subtracts the last observed pose before a temporal DCT, reducing the first DC component ($f=0$) dominance and highlighting informative higher-frequency cues so the model learns how joints move rather than where they are; and (ii) a stickman-drawing spatial-mamba module that processes joints in an ordered, joint-by-joint manner, making later joints condition on earlier ones to induce long-range, cross-joint dependencies. On Human3.6M and HumanEva, these coherence mechanisms deliver state-of-the-art results among single-stage probabilistic HMP methods while using less latency and memory than multi-stage diffusion baselines.

Related papers

Synergizing Transport-Based Generative Models and Latent Geometry for Stochastic Closure Modeling [1.665466637453776]
We show that flow matching in a lower-dimensional latent space is suited for fast sampling of closure models.<n>We control the latent space distortion and thus ensure the physical fidelity of the sampled closure term.
arXiv Detail & Related papers (2026-02-19T05:24:00Z)
Function-Space Decoupled Diffusion for Forward and Inverse Modeling in Carbon Capture and Storage [65.51149575007149]
We present Fun-DDPS, a generative framework that combines function-space diffusion models with differentiable neural operator surrogates for both forward and inverse modeling.<n>Fun-DDPS produces physically consistent realizations free from the high-frequency artifacts observed in joint-state baselines.
arXiv Detail & Related papers (2026-02-12T18:58:12Z)
DD-MDN: Human Trajectory Forecasting with Diffusion-Based Dual Mixture Density Networks and Uncertainty Self-Calibration [7.946001746395269]
Human Trajectory Forecasting (HTF) predicts future human movements from past trajectories and environmental context.<n>We propose DD-MDN, an end-to-end probabilistic HTF model that combines high positional accuracy, calibrated uncertainty, and robustness to short observations.<n> Experiments on the ETH/UCY, SDD, inD, and IMPTC datasets demonstrate state-of-the-art accuracy, robustness at short observation intervals, and reliable uncertainty modeling.
arXiv Detail & Related papers (2026-02-11T08:59:33Z)
Bridging the Discrete-Continuous Gap: Unified Multimodal Generation via Coupled Manifold Discrete Absorbing Diffusion [60.186310080523135]
Bifurcation of generative modeling into autoregressive approaches for discrete data (text) and diffusion approaches for continuous data (images) hinders development of truly unified multimodal systems.<n>We propose textbfCoM-DAD, a novel probabilistic framework that reformulates multimodal generation as a hierarchical dual-process.<n>Our method demonstrates superior stability over standard masked modeling, establishing a new paradigm for scalable, unified text-image generation.
arXiv Detail & Related papers (2026-01-07T16:21:19Z)
Navigating the Exploration-Exploitation Tradeoff in Inference-Time Scaling of Diffusion Models [11.813933389519358]
Inference-time scaling has achieved remarkable success in language models, yet its adaptation to diffusion models remains underexplored.<n>We propose two strategies: Schedule and Adaptive Temperature.<n>Our methods significantly enhance sample quality without increasing the total number of Noise Evaluations.
arXiv Detail & Related papers (2025-08-17T13:35:38Z)
Multi-Agent Path Finding in Continuous Spaces with Projected Diffusion Models [57.45019514036948]
Multi-Agent Path Finding (MAPF) is a fundamental problem in robotics.<n>This work proposes a novel approach that integrates constrained optimization with diffusion models for MAPF in continuous spaces.
arXiv Detail & Related papers (2024-12-23T21:27:19Z)
Bayesian-Optimized One-Step Diffusion Model with Knowledge Distillation for Real-Time 3D Human Motion Prediction [2.402745776249116]
We propose training a one-step multi-layer perceptron-based (MLP-based) diffusion model for motion prediction using knowledge distillation and Bayesian optimization. Our model can significantly improve the inference speed, achieving real-time prediction without noticeable degradation in performance.
arXiv Detail & Related papers (2024-09-19T04:36:40Z)
Adversarial Schrödinger Bridge Matching [66.39774923893103]
Iterative Markovian Fitting (IMF) procedure alternates between Markovian and reciprocal projections of continuous-time processes. We propose a novel Discrete-time IMF (D-IMF) procedure in which learning of processes is replaced by learning just a few transition probabilities in discrete time. We show that our D-IMF procedure can provide the same quality of unpaired domain translation as the IMF, using only several generation steps instead of hundreds.
arXiv Detail & Related papers (2024-05-23T11:29:33Z)
One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls [77.42510898755037]
One More Step (OMS) is a compact network that incorporates an additional simple yet effective step during inference. OMS elevates image fidelity and harmonizes the dichotomy between training and inference, while preserving original model parameters. Once trained, various pre-trained diffusion models with the same latent domain can share the same OMS module.
arXiv Detail & Related papers (2023-11-27T12:02:42Z)
Generative Fractional Diffusion Models [53.36835573822926]
We introduce the first continuous-time score-based generative model that leverages fractional diffusion processes for its underlying dynamics. Our evaluations on real image datasets demonstrate that GFDM achieves greater pixel-wise diversity and enhanced image quality, as indicated by a lower FID.
arXiv Detail & Related papers (2023-10-26T17:53:24Z)
BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction [26.306489700180627]
We present BeLFusion, a model that leverages latent diffusion models in human motion prediction (HMP) to sample from a latent space where behavior is disentangled from pose and motion. Thanks to our behavior coupler's ability to transfer sampled behavior to ongoing motion, BeLFusion's predictions display a variety of behaviors that are significantly more realistic than the state of the art.
arXiv Detail & Related papers (2022-11-25T18:59:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.