StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
- URL: http://arxiv.org/abs/2503.21775v1
- Date: Thu, 27 Mar 2025 17:59:46 GMT
- Title: StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
- Authors: Ziyu Guo, Young Yoon Lee, Joseph Liu, Yizhak Ben-Shabat, Victor Zordan, Mubbasir Kapadia,
- Abstract summary: StyleMotif is a novel Stylized Motion Latent Diffusion model.<n>It generates motion conditioned on both content and style from multiple modalities.
- Score: 14.213279927964903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present StyleMotif, a novel Stylized Motion Latent Diffusion model, generating motion conditioned on both content and style from multiple modalities. Unlike existing approaches that either focus on generating diverse motion content or transferring style from sequences, StyleMotif seamlessly synthesizes motion across a wide range of content while incorporating stylistic cues from multi-modal inputs, including motion, text, image, video, and audio. To achieve this, we introduce a style-content cross fusion mechanism and align a style encoder with a pre-trained multi-modal model, ensuring that the generated motion accurately captures the reference style while preserving realism. Extensive experiments demonstrate that our framework surpasses existing methods in stylized motion generation and exhibits emergent capabilities for multi-modal motion stylization, enabling more nuanced motion synthesis. Source code and pre-trained models will be released upon acceptance. Project Page: https://stylemotif.github.io
Related papers
- SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models [54.641809532055916]
We introduce SOYO, a novel diffusion-based framework for video style morphing.
Our method employs a pre-trained text-to-image diffusion model without fine-tuning, combining attention injection and AdaIN to preserve structural consistency.
To harmonize across video frames, we propose a novel adaptive sampling scheduler between two style images.
arXiv Detail & Related papers (2025-03-10T07:27:01Z) - MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow [11.491447470132279]
In existing methods, the information usually only flows from style to content, which may cause conflict between the style and content.<n>In this work we build a bidirectional control flow between the style and the content, also adjusting the style towards the content.<n>We extend the stylized motion generation from one modality, i.e. the style motion, to multiple modalities including texts and images through contrastive learning.
arXiv Detail & Related papers (2024-12-13T06:40:26Z) - MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context.<n> multimodal representations from recaptioned prompt and video frames promote the modeling of appearance.<n>Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z) - SMooDi: Stylized Motion Diffusion Model [46.293854851116215]
We introduce a novel Stylized Motion Diffusion model, dubbed SMooDi, to generate stylized motion driven by content texts and style sequences.
Our proposed framework outperforms existing methods in stylized motion generation.
arXiv Detail & Related papers (2024-07-17T17:59:42Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - Generative Human Motion Stylization in Latent Space [42.831468727082694]
We present a novel generative model that produces diverse stylization results of a single motion (latent) code.
In inference, users can opt to stylize a motion using style cues from a reference motion or a label.
Experimental results show that our proposed stylization models, despite their lightweight design, outperform the state-of-the-art in style reenactment, content preservation, and generalization.
arXiv Detail & Related papers (2024-01-24T14:53:13Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - NewMove: Customizing text-to-video models with novel motions [74.9442859239997]
We introduce an approach for augmenting text-to-video generation models with customized motions.
By leveraging a few video samples demonstrating specific movements as input, our method learns and generalizes the input motion patterns for diverse, text-specified scenarios.
arXiv Detail & Related papers (2023-12-07T18:59:03Z) - Style-ERD: Responsive and Coherent Online Motion Style Transfer [13.15016322155052]
Style transfer is a common method for enriching character animation.
We propose a novel style transfer model, Style-ERD, to stylize motions in an online manner.
Our method stylizes motions into multiple target styles with a unified model.
arXiv Detail & Related papers (2022-03-04T21:12:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.