Dance Like a Chicken: Low-Rank Stylization for Human Motion Diffusion
- URL: http://arxiv.org/abs/2503.19557v1
- Date: Tue, 25 Mar 2025 11:23:34 GMT
- Title: Dance Like a Chicken: Low-Rank Stylization for Human Motion Diffusion
- Authors: Haim Sawdayee, Chuan Guo, Guy Tevet, Bing Zhou, Jian Wang, Amit H. Bermano,
- Abstract summary: We introduce LoRA-MDM, a framework for motion stylization that generalizes to complex actions while maintaining editability.<n>Our key insight is that adapting the generative prior to include the style, while preserving its overall distribution, is more effective than modifying each individual motion during generation.<n>LoRA-MDM learns to adapt the prior to include the reference style using only a few samples.
- Score: 28.94750481325469
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-motion generative models span a wide range of 3D human actions but struggle with nuanced stylistic attributes such as a "Chicken" style. Due to the scarcity of style-specific data, existing approaches pull the generative prior towards a reference style, which often results in out-of-distribution low quality generations. In this work, we introduce LoRA-MDM, a lightweight framework for motion stylization that generalizes to complex actions while maintaining editability. Our key insight is that adapting the generative prior to include the style, while preserving its overall distribution, is more effective than modifying each individual motion during generation. Building on this idea, LoRA-MDM learns to adapt the prior to include the reference style using only a few samples. The style can then be used in the context of different textual prompts for generation. The low-rank adaptation shifts the motion manifold in a semantically meaningful way, enabling realistic style infusion even for actions not present in the reference samples. Moreover, preserving the distribution structure enables advanced operations such as style blending and motion editing. We compare LoRA-MDM to state-of-the-art stylized motion generation methods and demonstrate a favorable balance between text fidelity and style consistency.
Related papers
- Style Evolving along Chain-of-Thought for Unknown-Domain Object Detection [35.35239718038119]
A task of Single-Domain Generalized Object Detection (Single-DGOD) is proposed, aiming to generalize a detector to multiple unknown domains never seen before during training.
We propose a new method, i.e., Style Evolving along Chain-of-Thought, which aims to progressively integrate and expand style information along the chain of thought.
arXiv Detail & Related papers (2025-03-13T02:14:10Z) - SMooDi: Stylized Motion Diffusion Model [46.293854851116215]
We introduce a novel Stylized Motion Diffusion model, dubbed SMooDi, to generate stylized motion driven by content texts and style sequences.
Our proposed framework outperforms existing methods in stylized motion generation.
arXiv Detail & Related papers (2024-07-17T17:59:42Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - DiffStyler: Diffusion-based Localized Image Style Transfer [0.0]
Image style transfer aims to imbue digital imagery with the distinctive attributes of style targets, such as colors, brushstrokes, shapes.
Despite the advancements in arbitrary style transfer methods, a prevalent challenge remains the delicate equilibrium between content semantics and style attributes.
This paper introduces DiffStyler, a novel approach that facilitates efficient and precise arbitrary image style transfer.
arXiv Detail & Related papers (2024-03-27T11:19:34Z) - Diffusion-based Human Motion Style Transfer with Semantic Guidance [23.600154466988073]
We propose a novel framework for few-shot style transfer learning based on the diffusion model.
In the first stage, we pre-train a diffusion-based text-to-motion model as a generative prior.
In the second stage, based on the single style example, we fine-tune the pre-trained diffusion model in a few-shot manner to make it capable of style transfer.
arXiv Detail & Related papers (2024-03-20T05:52:11Z) - HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced
Diffusion Models [84.12784265734238]
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video.
We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues.
A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
arXiv Detail & Related papers (2024-01-11T12:26:23Z) - ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs [56.85106417530364]
Low-rank adaptations (LoRA) have been proposed as a parameter-efficient way of achieving concept-driven personalization.
We propose ZipLoRA, a method to cheaply and effectively merge independently trained style and subject LoRAs.
Experiments show that ZipLoRA can generate compelling results with meaningful improvements over baselines in subject and style fidelity.
arXiv Detail & Related papers (2023-11-22T18:59:36Z) - ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style
Transfer [57.6482608202409]
Textual style transfer is the task of transforming stylistic properties of text while preserving meaning.
We introduce a novel diffusion-based framework for general-purpose style transfer that can be flexibly adapted to arbitrary target styles.
We validate the method on the Enron Email Corpus, with both human and automatic evaluations, and find that it outperforms strong baselines on formality, sentiment, and even authorship style transfer.
arXiv Detail & Related papers (2023-08-29T17:36:02Z) - StyleSync: High-Fidelity Generalized and Personalized Lip Sync in
Style-based Generator [85.40502725367506]
We propose StyleSync, an effective framework that enables high-fidelity lip synchronization.
Specifically, we design a mask-guided spatial information encoding module that preserves the details of the given face.
Our design also enables personalized lip-sync by introducing style space and generator refinement on only limited frames.
arXiv Detail & Related papers (2023-05-09T13:38:13Z) - GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents [3.229105662984031]
GestureDiffuCLIP is a neural network framework for synthesizing realistic, stylized co-speech gestures with flexible style control.
Our system learns a latent diffusion model to generate high-quality gestures and infuses the CLIP representations of style into the generator.
Our system can be extended to allow fine-grained style control of individual body parts.
arXiv Detail & Related papers (2023-03-26T03:35:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.