MotionCrafter: One-Shot Motion Customization of Diffusion Models
- URL: http://arxiv.org/abs/2312.05288v2
- Date: Tue, 2 Jan 2024 10:39:11 GMT
- Title: MotionCrafter: One-Shot Motion Customization of Diffusion Models
- Authors: Yuxin Zhang, Fan Tang, Nisha Huang, Haibin Huang, Chongyang Ma,
Weiming Dong, Changsheng Xu
- Abstract summary: We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
- Score: 66.44642854791807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The essence of a video lies in its dynamic motions, including character
actions, object movements, and camera movements. While text-to-video generative
diffusion models have recently advanced in creating diverse contents,
controlling specific motions through text prompts remains a significant
challenge. A primary issue is the coupling of appearance and motion, often
leading to overfitting on appearance. To tackle this challenge, we introduce
MotionCrafter, a novel one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the
reference motion into the temporal component of the base model, while the
spatial module is independently adjusted for character or style control. To
enhance the disentanglement of motion and appearance, we propose an innovative
dual-branch motion disentanglement approach, comprising a motion
disentanglement loss and an appearance prior enhancement strategy. During
training, a frozen base model provides appearance normalization, effectively
separating appearance from motion and thereby preserving diversity.
Comprehensive quantitative and qualitative experiments, along with user
preference tests, demonstrate that MotionCrafter can successfully integrate
dynamic motions while preserving the coherence and quality of the base model
with a wide range of appearance generation capabilities. Project page:
https://zyxelsa.github.io/homepage-motioncrafter. Codes are available at
https://github.com/zyxElsa/MotionCrafter.
Related papers
- Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics.
At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z) - Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer [55.109778609058154]
Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models.
We uncover the roles and interactions of attention elements in capturing and representing motion patterns.
We integrate these elements to transfer a leader motion to a follower one while maintaining the nuanced characteristics of the follower, resulting in zero-shot motion transfer.
arXiv Detail & Related papers (2024-06-10T17:47:14Z) - MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion [94.66090422753126]
MotionFollower is a lightweight score-guided diffusion model for video motion editing.
It delivers superior motion editing performance and exclusively supports large camera movements and actions.
Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory.
arXiv Detail & Related papers (2024-05-30T17:57:30Z) - CoMo: Controllable Motion Generation through Language Guided Pose Code Editing [57.882299081820626]
We introduce CoMo, a Controllable Motion generation model, adept at accurately generating and editing motions.
CoMo decomposes motions into discrete and semantically meaningful pose codes.
It autoregressively generates sequences of pose codes, which are then decoded into 3D motions.
arXiv Detail & Related papers (2024-03-20T18:11:10Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - MotionDirector: Motion Customization of Text-to-Video Diffusion Models [24.282240656366714]
Motion Customization aims to adapt existing text-to-video diffusion models to generate videos with customized motion.
We propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion.
Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions.
arXiv Detail & Related papers (2023-10-12T16:26:18Z) - We never go out of Style: Motion Disentanglement by Subspace
Decomposition of Latent Space [38.54517335215281]
We propose a novel method to decompose motion in videos by using a pretrained image GAN model.
We discover disentangled motion subspaces in the latent space of widely used style-based GAN models.
We evaluate the disentanglement properties of motion subspaces on face and car datasets.
arXiv Detail & Related papers (2023-06-01T11:18:57Z) - MoStGAN-V: Video Generation with Temporal Motion Styles [28.082294960744726]
Previous works attempt to generate videos in arbitrary lengths either in an autoregressive manner or regarding time as a continuous signal.
We argue that a single time-agnostic latent vector of style-based generator is insufficient to model various and temporally-consistent motions.
We introduce additional time-dependent motion styles to model diverse motion patterns.
arXiv Detail & Related papers (2023-04-05T22:47:12Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.