AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
without Specific Tuning
- URL: http://arxiv.org/abs/2307.04725v2
- Date: Thu, 8 Feb 2024 18:08:57 GMT
- Title: AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
without Specific Tuning
- Authors: Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu
Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai
- Abstract summary: AnimateDiff is a framework for animating personalized T2I models without requiring model-specific tuning.
We propose MotionLoRA, a lightweight fine-tuning technique for AnimateDiff that enables a pre-trained motion module to adapt to new motion patterns.
Results show that our approaches help these models generate temporally smooth animation clips while preserving the visual quality and motion diversity.
- Score: 92.33690050667475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advance of text-to-image (T2I) diffusion models (e.g., Stable
Diffusion) and corresponding personalization techniques such as DreamBooth and
LoRA, everyone can manifest their imagination into high-quality images at an
affordable cost. However, adding motion dynamics to existing high-quality
personalized T2Is and enabling them to generate animations remains an open
challenge. In this paper, we present AnimateDiff, a practical framework for
animating personalized T2I models without requiring model-specific tuning. At
the core of our framework is a plug-and-play motion module that can be trained
once and seamlessly integrated into any personalized T2Is originating from the
same base T2I. Through our proposed training strategy, the motion module
effectively learns transferable motion priors from real-world videos. Once
trained, the motion module can be inserted into a personalized T2I model to
form a personalized animation generator. We further propose MotionLoRA, a
lightweight fine-tuning technique for AnimateDiff that enables a pre-trained
motion module to adapt to new motion patterns, such as different shot types, at
a low training and data collection cost. We evaluate AnimateDiff and MotionLoRA
on several public representative personalized T2I models collected from the
community. The results demonstrate that our approaches help these models
generate temporally smooth animation clips while preserving the visual quality
and motion diversity. Codes and pre-trained weights are available at
https://github.com/guoyww/AnimateDiff.
Related papers
- MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation [65.74312406211213]
This paper presents a method that allows users to design cinematic video shots in the context of image-to-video generation.
By connecting insights from classical computer graphics and contemporary video generation techniques, we demonstrate the ability to achieve 3D-aware motion control in I2V synthesis.
arXiv Detail & Related papers (2025-02-06T18:41:04Z) - MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context.
multimodal representations from recaptioned prompt and video frames promote the modeling of appearance.
Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z) - Pix2Gif: Motion-Guided Diffusion for GIF Generation [70.64240654310754]
We present Pix2Gif, a motion-guided diffusion model for image-to-GIF (video) generation.
We propose a new motion-guided warping module to spatially transform the features of the source image conditioned on the two types of prompts.
In preparation for the model training, we meticulously curated data by extracting coherent image frames from the TGIF video-caption dataset.
arXiv Detail & Related papers (2024-03-07T16:18:28Z) - Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models [48.56724784226513]
We propose Customize-A-Video that models the motion from a single reference video and adapts it to new subjects and scenes with both spatial and temporal varieties.
The proposed modules are trained in a staged pipeline and inferred in a plug-and-play fashion, enabling easy extensions to various downstream tasks.
arXiv Detail & Related papers (2024-02-22T18:38:48Z) - Animated Stickers: Bringing Stickers to Life with Video Diffusion [25.81904166775557]
We introduce animated stickers, a video diffusion model which generates an animation conditioned on a text prompt and static image.
Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion.
arXiv Detail & Related papers (2024-02-08T22:49:32Z) - AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff.
For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation.
For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z) - MotionDirector: Motion Customization of Text-to-Video Diffusion Models [24.282240656366714]
Motion Customization aims to adapt existing text-to-video diffusion models to generate videos with customized motion.
We propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion.
Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions.
arXiv Detail & Related papers (2023-10-12T16:26:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.