SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion
- URL: http://arxiv.org/abs/2405.02844v2
- Date: Tue, 10 Jun 2025 16:34:13 GMT
- Title: SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion
- Authors: Ziyun Qian, Zeyu Xiao, Xingliang Jin, Dingkang Yang, Mingcheng Li, Zhenyi Wu, Dongliang Kou, Peng Zhai, Lihua Zhang,
- Abstract summary: Motion style transfer enables virtual digital humans to rapidly switch between different styles of the same motion.<n>Most existing methods adopt a two-stream structure, which tends to overlook the intrinsic relationship between content and style motions.<n>We propose a Unified Motion Style Diffusion (UMSD) framework, which simultaneously extracts features from both content and style motions.<n>We also introduce the Motion Style Mamba (MSM) denoiser, the first approach in the field of motion style transfer to leverage Mamba's powerful sequence modelling capability.
- Score: 12.426879081036116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motion style transfer is a significant research direction in the field of computer vision, enabling virtual digital humans to rapidly switch between different styles of the same motion, thereby significantly enhancing the richness and realism of movements. It has been widely applied in multimedia scenarios such as films, games, and the metaverse. However, most existing methods adopt a two-stream structure, which tends to overlook the intrinsic relationship between content and style motions, leading to information loss and poor alignment. Moreover, when handling long-range motion sequences, these methods fail to effectively learn temporal dependencies, ultimately resulting in unnatural generated motions. To address these limitations, we propose a Unified Motion Style Diffusion (UMSD) framework, which simultaneously extracts features from both content and style motions and facilitates sufficient information interaction. Additionally, we introduce the Motion Style Mamba (MSM) denoiser, the first approach in the field of motion style transfer to leverage Mamba's powerful sequence modelling capability. Better capturing temporal relationships generates more coherent stylized motion sequences. Third, we design a diffusion-based content consistency loss and a style consistency loss to constrain the framework, ensuring that it inherits the content motion while effectively learning the characteristics of the style motion. Finally, extensive experiments demonstrate that our method outperforms state-of-the-art (SOTA) methods qualitatively and quantitatively, achieving more realistic and coherent motion style transfer.
Related papers
- CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning [47.195002937893115]
CoMo aims to learn more informative continuous motion representations from diverse, internet-scale videos.<n>We introduce two new metrics for more robustly and affordably evaluating motion and guiding motion learning methods.<n>CoMo exhibits strong zero-shot generalization, enabling it to generate continuous pseudo actions for previously unseen video domains.
arXiv Detail & Related papers (2025-05-22T17:58:27Z) - StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion [14.213279927964903]
StyleMotif is a novel Stylized Motion Latent Diffusion model.
It generates motion conditioned on both content and style from multiple modalities.
arXiv Detail & Related papers (2025-03-27T17:59:46Z) - Decoupling Contact for Fine-Grained Motion Style Transfer [21.61658765014968]
Motion style transfer changes the style of a motion while retaining its content and is useful in computer animations and games.
It is unknown how to decouple and control contact to achieve fine-grained control in motion style transfer.
We present a novel style transfer method for fine-grained control over contacts while achieving both motion naturalness and spatial-temporal variations of style.
arXiv Detail & Related papers (2024-09-09T07:33:14Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.<n> SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.<n>Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - Motion Mamba: Efficient and Long Sequence Motion Generation [26.777455596989526]
Recent advancements in state space models (SSMs) have showcased considerable promise in long sequence modeling.
We propose Motion Mamba, a simple and efficient approach that presents the pioneering motion generation model utilized SSMs.
Our proposed method achieves up to 50% FID improvement and up to 4 times faster on the HumanML3D and KIT-ML datasets.
arXiv Detail & Related papers (2024-03-12T10:25:29Z) - MoST: Motion Style Transformer between Diverse Action Contents [23.62426940733713]
We propose a novel motion style transformer that effectively disentangles style from content and generates a plausible motion with transferred style from a source motion.
Our method outperforms existing methods and demonstrates exceptionally high quality, particularly in motion pairs with different contents, without the need for post-processing.
arXiv Detail & Related papers (2024-03-10T14:11:25Z) - MotionMix: Weakly-Supervised Diffusion for Controllable Motion
Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences.
Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning [16.094271750354835]
Motion information is critical to a robust and generalized video representation.
Recent works have adopted frame difference as the source of motion information in video contrastive learning.
We present a framework capable of introducing well-aligned and significant motion information.
arXiv Detail & Related papers (2023-09-01T07:03:27Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - RSMT: Real-time Stylized Motion Transition for Characters [15.856276818061891]
We propose a Real-time Stylized Motion Transition method (RSMT) to achieve all aforementioned goals.
Our method consists of two critical, independent components: a general motion manifold model and a style motion sampler.
Our method proves to be fast, high-quality, versatile, and controllable.
arXiv Detail & Related papers (2023-06-21T01:50:04Z) - MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot
Action Recognition [50.345327516891615]
We develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder.
MoLo can simultaneously learn long-range temporal context and motion cues for comprehensive few-shot matching.
arXiv Detail & Related papers (2023-04-03T13:09:39Z) - Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis.
We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching.
Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z) - HumanMAC: Masked Motion Completion for Human Motion Prediction [62.279925754717674]
Human motion prediction is a classical problem in computer vision and computer graphics.
Previous effects achieve great empirical performance based on an encoding-decoding style.
In this paper, we propose a novel framework from a new perspective.
arXiv Detail & Related papers (2023-02-07T18:34:59Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z) - Self-supervised Motion Learning from Static Images [36.85209332144106]
Motion from Static Images (MoSI) learns to encode motion information.
MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
arXiv Detail & Related papers (2021-04-01T03:55:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.