Related papers: KineST: A Kinematics-guided Spatiotemporal State Space Model for Human Motion Tracking from Sparse Signals

KineST: A Kinematics-guided Spatiotemporal State Space Model for Human Motion Tracking from Sparse Signals

URL: http://arxiv.org/abs/2512.16791v1
Date: Thu, 18 Dec 2025 17:25:47 GMT
Title: KineST: A Kinematics-guided Spatiotemporal State Space Model for Human Motion Tracking from Sparse Signals
Authors: Shuting Zhao, Zeyu Xiao, Xinrong Chen,
Abstract summary: Full-body motion tracking plays an essential role in AR/VR applications, bridging physical and virtual interactions.<n>It is challenging to reconstruct realistic and diverse full-body poses based on sparse signals obtained by head-mounted displays.<n>Existing methods for pose reconstruction often incur high computational costs or rely on separately spatial modeling and temporal dependencies.<n>We propose KineST, a novel kinematics-guided state space model, which effectively extracts geometric dependencies while integrating local and global pose perception.
Score: 11.14439818111551
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Full-body motion tracking plays an essential role in AR/VR applications, bridging physical and virtual interactions. However, it is challenging to reconstruct realistic and diverse full-body poses based on sparse signals obtained by head-mounted displays, which are the main devices in AR/VR scenarios. Existing methods for pose reconstruction often incur high computational costs or rely on separately modeling spatial and temporal dependencies, making it difficult to balance accuracy, temporal coherence, and efficiency. To address this problem, we propose KineST, a novel kinematics-guided state space model, which effectively extracts spatiotemporal dependencies while integrating local and global pose perception. The innovation comes from two core ideas. Firstly, in order to better capture intricate joint relationships, the scanning strategy within the State Space Duality framework is reformulated into kinematics-guided bidirectional scanning, which embeds kinematic priors. Secondly, a mixed spatiotemporal representation learning approach is employed to tightly couple spatial and temporal contexts, balancing accuracy and smoothness. Additionally, a geometric angular velocity loss is introduced to impose physically meaningful constraints on rotational variations for further improving motion stability. Extensive experiments demonstrate that KineST has superior performance in both accuracy and temporal consistency within a lightweight framework. Project page: https://kaka-1314.github.io/KineST/

Related papers

Towards Robust and Generalizable Continuous Space-Time Video Super-Resolution with Events [71.2439653098351]
Continuous space-time video super-STVSR has garnered increasing interest for its capability to reconstruct high-resolution and high-frame-rate videos at arbitrary temporal scales.<n>We present EvEnhancer, a novel approach that marries unique properties of high temporal and high dynamic range encapsulated in event streams.<n>Our method achieves state-of-the-art performance on both synthetic and real-world datasets, while maintaining generalizability at OOD scales.
arXiv Detail & Related papers (2025-10-04T15:23:07Z)
Bidirectional Feature-aligned Motion Transformation for Efficient Dynamic Point Cloud Compression [97.66080040613726]
We propose a Bidirectional Feature-aligned Motion Transformation (Bi-FMT) framework that implicitly models motion in the feature space.<n>Bi-FMT aligns features across both past and future frames to produce temporally consistent latent representations.<n>We show Bi-FMT surpasses D-DPCC and AdaDPCC in both compression efficiency and runtime.
arXiv Detail & Related papers (2025-09-18T03:51:06Z)
Forecasting Continuous Non-Conservative Dynamical Systems in SO(3) [51.510040541600176]
We propose a novel approach to modeling the rotation of moving objects in computer vision.<n>Our approach is agnostic to energy and momentum conservation while being robust to input noise.<n>By learning to approximate object dynamics from noisy states during training, our model attains robust extrapolation capabilities in simulation and various real-world settings.
arXiv Detail & Related papers (2025-08-11T09:03:10Z)
STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints [12.307413108334657]
We propose a novel sequence-to-sequence model for spatial-temporal motion Retargeting (STaR)<n> STaR consists of two modules: (1) a spatial module that incorporates dense shape representation and a novel limb penetration constraint to ensure geometric plausibility while preserving motion semantics, and (2) a temporal module that utilizes a temporal transformer and a temporal consistency constraint to predict the entire motion sequence at once while enforcing multi-level trajectory smoothness.
arXiv Detail & Related papers (2025-04-09T00:37:08Z)
ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z)
Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation [88.83749146867665]
Existing approaches learn a policy to predict a distant next-best end-effector pose.<n>They then compute the corresponding joint rotation angles for motion using inverse kinematics.<n>We propose Kinematics enhanced Spatial-TemporAl gRaph diffuser.
arXiv Detail & Related papers (2025-03-13T17:48:35Z)
MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking [58.719310295870024]
This paper presents an event-based framework for tracking any point.<n>To resolve ambiguities caused by event sparsity, a motion-guidance module incorporates kinematic vectors into the local matching process.<n>The method improves the $Survival_50$ metric by 17.9% over event-only tracking of any point baseline.
arXiv Detail & Related papers (2024-12-02T09:13:29Z)
Smooth and Sparse Latent Dynamics in Operator Learning with Jerk Regularization [1.621267003497711]
This paper introduces a continuous operator learning framework that incorporates jagged regularization into the learning of the compressed latent space. The framework allows for inference at any desired spatial or temporal resolution. The effectiveness of this framework is demonstrated through a two-dimensional unsteady flow problem governed by the Navier-Stokes equations.
arXiv Detail & Related papers (2024-02-23T22:38:45Z)
Tight Fusion of Events and Inertial Measurements for Direct Velocity Estimation [20.002238735553792]
We propose a novel solution to tight visual-inertial fusion directly at the level of first-order kinematics by employing a dynamic vision sensor instead of a normal camera. We demonstrate how velocity estimates in highly dynamic situations can be obtained over short time intervals. Experiments on both simulated and real data demonstrate that the proposed tight event-inertial fusion leads to continuous and reliable velocity estimation.
arXiv Detail & Related papers (2024-01-17T15:56:57Z)
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition [62.46544616232238]
Previous motion recognition methods have achieved promising performance through the tightly coupled multi-temporal representation. We propose to decouple and recouple caused caused representation for RGB-D-based motion recognition.
arXiv Detail & Related papers (2021-12-16T18:59:47Z)
Learning Self-Similarity in Space and Time as Generalized Motion for Action Recognition [42.175450800733785]
We propose a rich motion representation based on video self-similarity (STSS) We leverage the whole volume of STSSS and let our model learn to extract an effective motion representation from it. The proposed neural block, dubbed SELFY, can be easily inserted into neural architectures and trained end-to-end without additional supervision.
arXiv Detail & Related papers (2021-02-14T07:32:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.