Customize-A-Video: One-Shot Motion Customization of Text-to-Video
Diffusion Models
- URL: http://arxiv.org/abs/2402.14780v1
- Date: Thu, 22 Feb 2024 18:38:48 GMT
- Title: Customize-A-Video: One-Shot Motion Customization of Text-to-Video
Diffusion Models
- Authors: Yixuan Ren, Yang Zhou, Jimei Yang, Jing Shi, Difan Liu, Feng Liu,
Mingi Kwon, Abhinav Shrivastava
- Abstract summary: We propose Customize-A-Video that models the motion from a single reference video and adapting it to new subjects and scenes with both spatial and temporal varieties.
Our proposed method can be easily extended to various downstream tasks, including custom video generation and editing, video appearance customization, and multiple motion combination.
- Score: 50.65904921917907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image customization has been extensively studied in text-to-image (T2I)
diffusion models, leading to impressive outcomes and applications. With the
emergence of text-to-video (T2V) diffusion models, its temporal counterpart,
motion customization, has not yet been well investigated. To address the
challenge of one-shot motion customization, we propose Customize-A-Video that
models the motion from a single reference video and adapting it to new subjects
and scenes with both spatial and temporal varieties. It leverages low-rank
adaptation (LoRA) on temporal attention layers to tailor the pre-trained T2V
diffusion model for specific motion modeling from the reference videos. To
disentangle the spatial and temporal information during the training pipeline,
we introduce a novel concept of appearance absorbers that detach the original
appearance from the single reference video prior to motion learning. Our
proposed method can be easily extended to various downstream tasks, including
custom video generation and editing, video appearance customization, and
multiple motion combination, in a plug-and-play fashion. Our project page can
be found at https://anonymous-314.github.io.
Related papers
- Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models [52.28245595257831]
Cross-attention guidance can be a promising approach for editing videos.
We show that despite the limitations of current T2V models, cross-attention guidance can be a promising approach for editing videos.
arXiv Detail & Related papers (2024-04-08T13:40:01Z) - Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework [33.46782517803435]
Make-Your-Anchor is a system requiring only a one-minute video clip of an individual for training.
We finetune a proposed structure-guided diffusion model on input video to render 3D mesh conditions into human appearances.
A novel identity-specific face enhancement module is introduced to improve the visual quality of facial regions in the output videos.
arXiv Detail & Related papers (2024-03-25T07:54:18Z) - DreamVideo: Composing Your Dream Videos with Customized Subject and
Motion [52.7394517692186]
We present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject.
DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model.
In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern.
arXiv Detail & Related papers (2023-12-07T16:57:26Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - MoVideo: Motion-Aware Video Generation with Diffusion Models [102.81825637792572]
We propose a novel motion-aware generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow.
MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.
arXiv Detail & Related papers (2023-11-19T13:36:03Z) - MotionDirector: Motion Customization of Text-to-Video Diffusion Models [24.282240656366714]
Motion Customization aims to adapt existing text-to-video diffusion models to generate videos with customized motion.
We propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion.
Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions.
arXiv Detail & Related papers (2023-10-12T16:26:18Z) - AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
without Specific Tuning [92.33690050667475]
AnimateDiff is a framework for animating personalized T2I models without requiring model-specific tuning.
We propose MotionLoRA, a lightweight fine-tuning technique for AnimateDiff that enables a pre-trained motion module to adapt to new motion patterns.
Results show that our approaches help these models generate temporally smooth animation clips while preserving the visual quality and motion diversity.
arXiv Detail & Related papers (2023-07-10T17:34:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.