Related papers: SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion

SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion

URL: http://arxiv.org/abs/2405.02844v1
Date: Sun, 5 May 2024 08:28:07 GMT
Title: SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion
Authors: Ziyun Qian, Zeyu Xiao, Zhenyi Wu, Dingkang Yang, Mingcheng Li, Shunli Wang, Shuaibing Wang, Dongliang Kou, Lihua Zhang,
Abstract summary: Style transfer is widely applied in multimedia scenarios such as movies, games, and the Metaverse. Most of the current work in this field adopts the GAN, which may lead to instability and convergence issues. We propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion.
Score: 12.426879081036116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, which may lead to instability and convergence issues, making the final generated motion sequence somewhat chaotic and unable to reflect a highly realistic and natural style. To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion. Moreover, we apply Mamba model for the first time in the motion style transfer field, introducing the Motion Style Mamba (MSM) module to handle longer motion sequences. Thirdly, aiming at the SMCD framework, we propose Diffusion-based Content Consistency Loss and Content Consistency Loss to assist the overall framework's training. Finally, we conduct extensive experiments. The results reveal that our method surpasses state-of-the-art methods in both qualitative and quantitative comparisons, capable of generating more realistic motion sequences.

Related papers

StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion [14.213279927964903]
StyleMotif is a novel Stylized Motion Latent Diffusion model. It generates motion conditioned on both content and style from multiple modalities.
arXiv Detail & Related papers (2025-03-27T17:59:46Z)
Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs. SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions. Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z)
Motion Mamba: Efficient and Long Sequence Motion Generation [26.777455596989526]
Recent advancements in state space models (SSMs) have showcased considerable promise in long sequence modeling. We propose Motion Mamba, a simple and efficient approach that presents the pioneering motion generation model utilized SSMs. Our proposed method achieves up to 50% FID improvement and up to 4 times faster on the HumanML3D and KIT-ML datasets.
arXiv Detail & Related papers (2024-03-12T10:25:29Z)
MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences. Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z)
MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method. MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model. During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z)
DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions. We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z)
Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation. M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse. We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z)
RSMT: Real-time Stylized Motion Transition for Characters [15.856276818061891]
We propose a Real-time Stylized Motion Transition method (RSMT) to achieve all aforementioned goals. Our method consists of two critical, independent components: a general motion manifold model and a style motion sampler. Our method proves to be fast, high-quality, versatile, and controllable.
arXiv Detail & Related papers (2023-06-21T01:50:04Z)
Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis. We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching. Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z)
HumanMAC: Masked Motion Completion for Human Motion Prediction [62.279925754717674]
Human motion prediction is a classical problem in computer vision and computer graphics. Previous effects achieve great empirical performance based on an encoding-decoding style. In this paper, we propose a novel framework from a new perspective.
arXiv Detail & Related papers (2023-02-07T18:34:59Z)
Self-supervised Motion Learning from Static Images [36.85209332144106]
Motion from Static Images (MoSI) learns to encode motion information. MoSI can discover regions with large motion even without fine-tuning on the downstream datasets. We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
arXiv Detail & Related papers (2021-04-01T03:55:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.