STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints
- URL: http://arxiv.org/abs/2504.06504v1
- Date: Wed, 09 Apr 2025 00:37:08 GMT
- Title: STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints
- Authors: Xiaohang Yang, Qing Wang, Jiahao Yang, Gregory Slabaugh, Shanxin Yuan,
- Abstract summary: We propose a novel sequence-to-sequence model for spatial-temporal motion Retargeting (STaR)<n> STaR consists of two modules: (1) a spatial module that incorporates dense shape representation and a novel limb penetration constraint to ensure geometric plausibility while preserving motion semantics, and (2) a temporal module that utilizes a temporal transformer and a temporal consistency constraint to predict the entire motion sequence at once while enforcing multi-level trajectory smoothness.
- Score: 12.307413108334657
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Motion retargeting seeks to faithfully replicate the spatio-temporal motion characteristics of a source character onto a target character with a different body shape. Apart from motion semantics preservation, ensuring geometric plausibility and maintaining temporal consistency are also crucial for effective motion retargeting. However, many existing methods prioritize either geometric plausibility or temporal consistency. Neglecting geometric plausibility results in interpenetration while neglecting temporal consistency leads to motion jitter. In this paper, we propose a novel sequence-to-sequence model for seamless Spatial-Temporal aware motion Retargeting (STaR), with penetration and consistency constraints. STaR consists of two modules: (1) a spatial module that incorporates dense shape representation and a novel limb penetration constraint to ensure geometric plausibility while preserving motion semantics, and (2) a temporal module that utilizes a temporal transformer and a novel temporal consistency constraint to predict the entire motion sequence at once while enforcing multi-level trajectory smoothness. The seamless combination of the two modules helps us achieve a good balance between the semantic, geometric, and temporal targets. Extensive experiments on the Mixamo and ScanRet datasets demonstrate that our method produces plausible and coherent motions while significantly reducing interpenetration rates compared with other approaches.
Related papers
- ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z) - GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling [32.47567372398872]
GestureLSM is a flow-matching-based approach for Co-Speech Gesture Generation with spatial-temporal modeling.<n>It achieves state-of-the-art performance on BEAT2 while significantly reducing inference time compared to existing methods.
arXiv Detail & Related papers (2025-01-31T05:34:59Z) - Event-Based Tracking Any Point with Motion-Augmented Temporal Consistency [58.719310295870024]
This paper presents an event-based framework for tracking any point.<n>It tackles the challenges posed by spatial sparsity and motion sensitivity in events.<n>It achieves 150% faster processing with competitive model parameters.
arXiv Detail & Related papers (2024-12-02T09:13:29Z) - A Decoupled Spatio-Temporal Framework for Skeleton-based Action
Segmentation [89.86345494602642]
Existing methods are limited in weak-temporal modeling capability.
We propose a Decoupled Scoupled Framework (DeST) to address the issues.
DeST significantly outperforms current state-of-the-art methods with less computational complexity.
arXiv Detail & Related papers (2023-12-10T09:11:39Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - Skinned Motion Retargeting with Residual Perception of Motion Semantics
& Geometry [34.53794943807786]
A good motion cannot be reached without consideration of source-target differences on both the skeleton and shape geometry levels.
We propose a novel Residual RETargeting network (R2ET) structure, which relies on two neural modification modules.
Experiments on public dataset Mixamo demonstrate that our R2ET achieves the state-of-the-art performance.
arXiv Detail & Related papers (2023-03-15T14:41:26Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Spatial Temporal Graph Attention Network for Skeleton-Based Action
Recognition [10.60209288486904]
It's common for current methods in skeleton-based action recognition to mainly consider capturing long-term temporal dependencies.
We propose a general framework, coined as STGAT, to model cross-spacetime information flow.
STGAT achieves state-of-the-art performance on three large-scale datasets.
arXiv Detail & Related papers (2022-08-18T02:34:46Z) - Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based
Motion Recognition [62.46544616232238]
Previous motion recognition methods have achieved promising performance through the tightly coupled multi-temporal representation.
We propose to decouple and recouple caused caused representation for RGB-D-based motion recognition.
arXiv Detail & Related papers (2021-12-16T18:59:47Z) - TSI: Temporal Saliency Integration for Video Action Recognition [32.18535820790586]
We propose a Temporal Saliency Integration (TSI) block, which mainly contains a Salient Motion Excitation (SME) module and a Cross-scale Temporal Integration (CTI) module.
SME aims to highlight the motion-sensitive area through local-global motion modeling.
CTI is designed to perform multi-scale temporal modeling through a group of separate 1D convolutions respectively.
arXiv Detail & Related papers (2021-06-02T11:43:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.