SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion
- URL: http://arxiv.org/abs/2405.02844v1
- Date: Sun, 5 May 2024 08:28:07 GMT
- Title: SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion
- Authors: Ziyun Qian, Zeyu Xiao, Zhenyi Wu, Dingkang Yang, Mingcheng Li, Shunli Wang, Shuaibing Wang, Dongliang Kou, Lihua Zhang,
- Abstract summary: Style transfer is widely applied in multimedia scenarios such as movies, games, and the Metaverse.
Most of the current work in this field adopts the GAN, which may lead to instability and convergence issues.
We propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion.
- Score: 12.426879081036116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, which may lead to instability and convergence issues, making the final generated motion sequence somewhat chaotic and unable to reflect a highly realistic and natural style. To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion. Moreover, we apply Mamba model for the first time in the motion style transfer field, introducing the Motion Style Mamba (MSM) module to handle longer motion sequences. Thirdly, aiming at the SMCD framework, we propose Diffusion-based Content Consistency Loss and Content Consistency Loss to assist the overall framework's training. Finally, we conduct extensive experiments. The results reveal that our method surpasses state-of-the-art methods in both qualitative and quantitative comparisons, capable of generating more realistic motion sequences.
Related papers
- CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning [47.195002937893115]
CoMo aims to learn more informative continuous motion representations from diverse, internet-scale videos.<n>We introduce two new metrics for more robustly and affordably evaluating motion and guiding motion learning methods.<n>CoMo exhibits strong zero-shot generalization, enabling it to generate continuous pseudo actions for previously unseen video domains.
arXiv Detail & Related papers (2025-05-22T17:58:27Z) - StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion [14.213279927964903]
StyleMotif is a novel Stylized Motion Latent Diffusion model.
It generates motion conditioned on both content and style from multiple modalities.
arXiv Detail & Related papers (2025-03-27T17:59:46Z) - Decoupling Contact for Fine-Grained Motion Style Transfer [21.61658765014968]
Motion style transfer changes the style of a motion while retaining its content and is useful in computer animations and games.
It is unknown how to decouple and control contact to achieve fine-grained control in motion style transfer.
We present a novel style transfer method for fine-grained control over contacts while achieving both motion naturalness and spatial-temporal variations of style.
arXiv Detail & Related papers (2024-09-09T07:33:14Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.<n> SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.<n>Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - Motion Mamba: Efficient and Long Sequence Motion Generation [26.777455596989526]
Recent advancements in state space models (SSMs) have showcased considerable promise in long sequence modeling.
We propose Motion Mamba, a simple and efficient approach that presents the pioneering motion generation model utilized SSMs.
Our proposed method achieves up to 50% FID improvement and up to 4 times faster on the HumanML3D and KIT-ML datasets.
arXiv Detail & Related papers (2024-03-12T10:25:29Z) - MoST: Motion Style Transformer between Diverse Action Contents [23.62426940733713]
We propose a novel motion style transformer that effectively disentangles style from content and generates a plausible motion with transferred style from a source motion.
Our method outperforms existing methods and demonstrates exceptionally high quality, particularly in motion pairs with different contents, without the need for post-processing.
arXiv Detail & Related papers (2024-03-10T14:11:25Z) - MotionMix: Weakly-Supervised Diffusion for Controllable Motion
Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences.
Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning [16.094271750354835]
Motion information is critical to a robust and generalized video representation.
Recent works have adopted frame difference as the source of motion information in video contrastive learning.
We present a framework capable of introducing well-aligned and significant motion information.
arXiv Detail & Related papers (2023-09-01T07:03:27Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - RSMT: Real-time Stylized Motion Transition for Characters [15.856276818061891]
We propose a Real-time Stylized Motion Transition method (RSMT) to achieve all aforementioned goals.
Our method consists of two critical, independent components: a general motion manifold model and a style motion sampler.
Our method proves to be fast, high-quality, versatile, and controllable.
arXiv Detail & Related papers (2023-06-21T01:50:04Z) - MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot
Action Recognition [50.345327516891615]
We develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder.
MoLo can simultaneously learn long-range temporal context and motion cues for comprehensive few-shot matching.
arXiv Detail & Related papers (2023-04-03T13:09:39Z) - Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis.
We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching.
Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z) - HumanMAC: Masked Motion Completion for Human Motion Prediction [62.279925754717674]
Human motion prediction is a classical problem in computer vision and computer graphics.
Previous effects achieve great empirical performance based on an encoding-decoding style.
In this paper, we propose a novel framework from a new perspective.
arXiv Detail & Related papers (2023-02-07T18:34:59Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z) - Self-supervised Motion Learning from Static Images [36.85209332144106]
Motion from Static Images (MoSI) learns to encode motion information.
MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
arXiv Detail & Related papers (2021-04-01T03:55:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.