MotionDirector: Motion Customization of Text-to-Video Diffusion Models
- URL: http://arxiv.org/abs/2310.08465v1
- Date: Thu, 12 Oct 2023 16:26:18 GMT
- Title: MotionDirector: Motion Customization of Text-to-Video Diffusion Models
- Authors: Rui Zhao, Yuchao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu,
Weijia Wu, Jussi Keppo, Mike Zheng Shou
- Abstract summary: Motion Customization aims to adapt existing text-to-video diffusion models to generate videos with customized motion.
We propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion.
Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions.
- Score: 24.282240656366714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale pre-trained diffusion models have exhibited remarkable
capabilities in diverse video generations. Given a set of video clips of the
same motion concept, the task of Motion Customization is to adapt existing
text-to-video diffusion models to generate videos with this motion. For
example, generating a video with a car moving in a prescribed manner under
specific camera movements to make a movie, or a video illustrating how a bear
would lift weights to inspire creators. Adaptation methods have been developed
for customizing appearance like subject or style, yet unexplored for motion. It
is straightforward to extend mainstream adaption methods for motion
customization, including full model tuning, parameter-efficient tuning of
additional layers, and Low-Rank Adaptions (LoRAs). However, the motion concept
learned by these methods is often coupled with the limited appearances in the
training videos, making it difficult to generalize the customized motion to
other appearances. To overcome this challenge, we propose MotionDirector, with
a dual-path LoRAs architecture to decouple the learning of appearance and
motion. Further, we design a novel appearance-debiased temporal loss to
mitigate the influence of appearance on the temporal training objective.
Experimental results show the proposed method can generate videos of diverse
appearances for the customized motions. Our method also supports various
downstream applications, such as the mixing of different videos with their
appearance and motion respectively, and animating a single image with
customized motions. Our code and model weights will be released.
Related papers
- ViMo: Generating Motions from Casual Videos [34.19904765033005]
We propose a novel Video-to-Motion-Generation framework (ViMo)
ViMo could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions.
Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist.
arXiv Detail & Related papers (2024-08-13T03:57:35Z) - Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models [48.56724784226513]
We propose Customize-A-Video that models the motion from a single reference video and adapts it to new subjects and scenes with both spatial and temporal varieties.
The proposed modules are trained in a staged pipeline and inferred in a plug-and-play fashion, enabling easy extensions to various downstream tasks.
arXiv Detail & Related papers (2024-02-22T18:38:48Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - Customizing Motion in Text-to-Video Diffusion Models [79.4121510826141]
We introduce an approach for augmenting text-to-video generation models with customized motions.
By leveraging a few video samples demonstrating specific movements as input, our method learns and generalizes the input motion patterns for diverse, text-specified scenarios.
arXiv Detail & Related papers (2023-12-07T18:59:03Z) - DreamVideo: Composing Your Dream Videos with Customized Subject and
Motion [52.7394517692186]
We present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject.
DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model.
In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern.
arXiv Detail & Related papers (2023-12-07T16:57:26Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
without Specific Tuning [92.33690050667475]
AnimateDiff is a framework for animating personalized T2I models without requiring model-specific tuning.
We propose MotionLoRA, a lightweight fine-tuning technique for AnimateDiff that enables a pre-trained motion module to adapt to new motion patterns.
Results show that our approaches help these models generate temporally smooth animation clips while preserving the visual quality and motion diversity.
arXiv Detail & Related papers (2023-07-10T17:34:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.