MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow
- URL: http://arxiv.org/abs/2412.09901v1
- Date: Fri, 13 Dec 2024 06:40:26 GMT
- Title: MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow
- Authors: Zhe Li, Yisheng He, Lei Zhong, Weichao Shen, Qi Zuo, Lingteng Qiu, Zilong Dong, Laurence Tianruo Yang, Weihao Yuan,
- Abstract summary: In existing methods, the information usually only flows from style to content, which may cause conflict between the style and content.
In this work we build a bidirectional control flow between the style and the content, also adjusting the style towards the content.
We extend the stylized motion generation from one modality, i.e. the style motion, to multiple modalities including texts and images through contrastive learning.
- Score: 11.491447470132279
- License:
- Abstract: Generating motion sequences conforming to a target style while adhering to the given content prompts requires accommodating both the content and style. In existing methods, the information usually only flows from style to content, which may cause conflict between the style and content, harming the integration. Differently, in this work we build a bidirectional control flow between the style and the content, also adjusting the style towards the content, in which case the style-content collision is alleviated and the dynamics of the style is better preserved in the integration. Moreover, we extend the stylized motion generation from one modality, i.e. the style motion, to multiple modalities including texts and images through contrastive learning, leading to flexible style control on the motion generation. Extensive experiments demonstrate that our method significantly outperforms previous methods across different datasets, while also enabling multimodal signals control. The code of our method will be made publicly available.
Related papers
- Content-style disentangled representation for controllable artistic image stylization and generation [0.0]
Controllable artistic image stylization and generation aims to render the content provided by text or image with the learned artistic style.
This paper proposes a content-style representation disentangling method for controllable artistic image stylization and generation.
arXiv Detail & Related papers (2024-12-19T03:42:58Z) - StyleMaster: Stylize Your Video with Artistic Generation and Translation [43.808656030545556]
Style control has been popular in video generation models.
Current methods often generate videos far from the given style, cause content leakage, and struggle to transfer one video to the desired style.
Our approach, StyleMaster, achieves significant improvement in both style resemblance and temporal coherence.
arXiv Detail & Related papers (2024-12-10T18:44:08Z) - SMooDi: Stylized Motion Diffusion Model [46.293854851116215]
We introduce a novel Stylized Motion Diffusion model, dubbed SMooDi, to generate stylized motion driven by content texts and style sequences.
Our proposed framework outperforms existing methods in stylized motion generation.
arXiv Detail & Related papers (2024-07-17T17:59:42Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - MoST: Motion Style Transformer between Diverse Action Contents [23.62426940733713]
We propose a novel motion style transformer that effectively disentangles style from content and generates a plausible motion with transferred style from a source motion.
Our method outperforms existing methods and demonstrates exceptionally high quality, particularly in motion pairs with different contents, without the need for post-processing.
arXiv Detail & Related papers (2024-03-10T14:11:25Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter [78.75422651890776]
StyleCrafter is a generic method that enhances pre-trained T2V models with a style control adapter.
To promote content-style disentanglement, we remove style descriptions from the text prompt and extract style information solely from the reference image.
StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images.
arXiv Detail & Related papers (2023-12-01T03:53:21Z) - StyleAdapter: A Unified Stylized Image Generation Model [97.24936247688824]
StyleAdapter is a unified stylized image generation model capable of producing a variety of stylized images.
It can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet.
arXiv Detail & Related papers (2023-09-04T19:16:46Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.