MotionDirector: Motion Customization of Text-to-Video Diffusion Models
- URL: http://arxiv.org/abs/2310.08465v1
- Date: Thu, 12 Oct 2023 16:26:18 GMT
- Title: MotionDirector: Motion Customization of Text-to-Video Diffusion Models
- Authors: Rui Zhao, Yuchao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu,
Weijia Wu, Jussi Keppo, Mike Zheng Shou
- Abstract summary: Motion Customization aims to adapt existing text-to-video diffusion models to generate videos with customized motion.
We propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion.
Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions.
- Score: 24.282240656366714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale pre-trained diffusion models have exhibited remarkable
capabilities in diverse video generations. Given a set of video clips of the
same motion concept, the task of Motion Customization is to adapt existing
text-to-video diffusion models to generate videos with this motion. For
example, generating a video with a car moving in a prescribed manner under
specific camera movements to make a movie, or a video illustrating how a bear
would lift weights to inspire creators. Adaptation methods have been developed
for customizing appearance like subject or style, yet unexplored for motion. It
is straightforward to extend mainstream adaption methods for motion
customization, including full model tuning, parameter-efficient tuning of
additional layers, and Low-Rank Adaptions (LoRAs). However, the motion concept
learned by these methods is often coupled with the limited appearances in the
training videos, making it difficult to generalize the customized motion to
other appearances. To overcome this challenge, we propose MotionDirector, with
a dual-path LoRAs architecture to decouple the learning of appearance and
motion. Further, we design a novel appearance-debiased temporal loss to
mitigate the influence of appearance on the temporal training objective.
Experimental results show the proposed method can generate videos of diverse
appearances for the customized motions. Our method also supports various
downstream applications, such as the mixing of different videos with their
appearance and motion respectively, and animating a single image with
customized motions. Our code and model weights will be released.
Related papers
- Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion Models [18.41701130228042]
Motion customization aims to adapt the diffusion model (DM) to generate videos with the motion specified by a set of video clips with the same motion concept.
This paper proposes two novel strategies to enhance motion-appearance separation, including temporal attention purification (TAP) and appearance highway (AH)
arXiv Detail & Related papers (2025-01-28T05:40:20Z) - Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories.
We translate high-level user requests into detailed, semi-dense motion prompts.
We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z) - MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context.
multimodal representations from recaptioned prompt and video frames promote the modeling of appearance.
Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z) - Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models [48.56724784226513]
We propose Customize-A-Video that models the motion from a single reference video and adapts it to new subjects and scenes with both spatial and temporal varieties.
The proposed modules are trained in a staged pipeline and inferred in a plug-and-play fashion, enabling easy extensions to various downstream tasks.
arXiv Detail & Related papers (2024-02-22T18:38:48Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - DreamVideo: Composing Your Dream Videos with Customized Subject and
Motion [52.7394517692186]
We present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject.
DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model.
In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern.
arXiv Detail & Related papers (2023-12-07T16:57:26Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.