ClusterStyle: Modeling Intra-Style Diversity with Prototypical Clustering for Stylized Motion Generation
- URL: http://arxiv.org/abs/2512.02453v1
- Date: Tue, 02 Dec 2025 06:24:14 GMT
- Title: ClusterStyle: Modeling Intra-Style Diversity with Prototypical Clustering for Stylized Motion Generation
- Authors: Kerui Chen, Jianrong Zhang, Ming Li, Zhonglong Zheng, Hehe Fan,
- Abstract summary: We propose a clustering-based framework, ClusterStyle, to address intra-style diversity.<n>We leverage a set of prototypes to model diverse style patterns across motions belonging to the same style category.<n>Our approach outperforms existing state-of-the-art models in stylized motion generation and motion style transfer.
- Score: 33.75564496181951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing stylized motion generation models have shown their remarkable ability to understand specific style information from the style motion, and insert it into the content motion. However, capturing intra-style diversity, where a single style should correspond to diverse motion variations, remains a significant challenge. In this paper, we propose a clustering-based framework, ClusterStyle, to address this limitation. Instead of learning an unstructured embedding from each style motion, we leverage a set of prototypes to effectively model diverse style patterns across motions belonging to the same style category. We consider two types of style diversity: global-level diversity among style motions of the same category, and local-level diversity within the temporal dynamics of motion sequences. These components jointly shape two structured style embedding spaces, i.e., global and local, optimized via alignment with non-learnable prototype anchors. Furthermore, we augment the pretrained text-to-motion generation model with the Stylistic Modulation Adapter (SMA) to integrate the style features. Extensive experiments demonstrate that our approach outperforms existing state-of-the-art models in stylized motion generation and motion style transfer.
Related papers
- AStF: Motion Style Transfer via Adaptive Statistics Fusor [58.660938790014455]
We propose a novel Adaptive Statistics Fusor (AStF) which consists of Style Distemporalment Module (SDM) and High-Order Multi-Statistics Attention (HO-SAttn)<n> Experimental results show that, by providing a more comprehensive model, our proposed AStF shows proficiency in motion style over state-of-the-arts techniques.
arXiv Detail & Related papers (2025-11-06T08:51:24Z) - StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion [14.213279927964903]
StyleMotif is a novel Stylized Motion Latent Diffusion model.<n>It generates motion conditioned on both content and style from multiple modalities.
arXiv Detail & Related papers (2025-03-27T17:59:46Z) - Pluggable Style Representation Learning for Multi-Style Transfer [41.09041735653436]
We develop a style transfer framework by decoupling the style modeling and transferring.<n>For style modeling, we propose a style representation learning scheme to encode the style information into a compact representation.<n>For style transferring, we develop a style-aware multi-style transfer network (SaMST) to adapt to diverse styles using pluggable style representations.
arXiv Detail & Related papers (2025-03-26T09:44:40Z) - Style Evolving along Chain-of-Thought for Unknown-Domain Object Detection [35.35239718038119]
A task of Single-Domain Generalized Object Detection (Single-DGOD) is proposed, aiming to generalize a detector to multiple unknown domains never seen before during training.<n>We propose a new method, i.e., Style Evolving along Chain-of-Thought, which aims to progressively integrate and expand style information along the chain of thought.
arXiv Detail & Related papers (2025-03-13T02:14:10Z) - MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow [11.491447470132279]
In existing methods, the information usually only flows from style to content, which may cause conflict between the style and content.<n>In this work we build a bidirectional control flow between the style and the content, also adjusting the style towards the content.<n>We extend the stylized motion generation from one modality, i.e. the style motion, to multiple modalities including texts and images through contrastive learning.
arXiv Detail & Related papers (2024-12-13T06:40:26Z) - SMooDi: Stylized Motion Diffusion Model [46.293854851116215]
We introduce a novel Stylized Motion Diffusion model, dubbed SMooDi, to generate stylized motion driven by content texts and style sequences.
Our proposed framework outperforms existing methods in stylized motion generation.
arXiv Detail & Related papers (2024-07-17T17:59:42Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - All-to-key Attention for Arbitrary Style Transfer [98.83954812536521]
We propose a novel all-to-key attention mechanism -- each position of content features is matched to stable key positions of style features.
The resultant module, dubbed StyA2K, shows extraordinary performance in preserving the semantic structure and rendering consistent style patterns.
arXiv Detail & Related papers (2022-12-08T06:46:35Z) - StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval [119.03470556503942]
Crossmodal matching problem is typically solved by learning a joint embedding space where semantic content shared between photo and sketch modalities are preserved.
An effective model needs to explicitly account for this style diversity, crucially, to unseen user styles.
Our model can not only disentangle the cross-modal shared semantic content, but can adapt the disentanglement to any unseen user style as well, making the model truly agnostic.
arXiv Detail & Related papers (2021-03-29T15:44:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.