AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
without Specific Tuning
- URL: http://arxiv.org/abs/2307.04725v2
- Date: Thu, 8 Feb 2024 18:08:57 GMT
- Title: AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
without Specific Tuning
- Authors: Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu
Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai
- Abstract summary: AnimateDiff is a framework for animating personalized T2I models without requiring model-specific tuning.
We propose MotionLoRA, a lightweight fine-tuning technique for AnimateDiff that enables a pre-trained motion module to adapt to new motion patterns.
Results show that our approaches help these models generate temporally smooth animation clips while preserving the visual quality and motion diversity.
- Score: 92.33690050667475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advance of text-to-image (T2I) diffusion models (e.g., Stable
Diffusion) and corresponding personalization techniques such as DreamBooth and
LoRA, everyone can manifest their imagination into high-quality images at an
affordable cost. However, adding motion dynamics to existing high-quality
personalized T2Is and enabling them to generate animations remains an open
challenge. In this paper, we present AnimateDiff, a practical framework for
animating personalized T2I models without requiring model-specific tuning. At
the core of our framework is a plug-and-play motion module that can be trained
once and seamlessly integrated into any personalized T2Is originating from the
same base T2I. Through our proposed training strategy, the motion module
effectively learns transferable motion priors from real-world videos. Once
trained, the motion module can be inserted into a personalized T2I model to
form a personalized animation generator. We further propose MotionLoRA, a
lightweight fine-tuning technique for AnimateDiff that enables a pre-trained
motion module to adapt to new motion patterns, such as different shot types, at
a low training and data collection cost. We evaluate AnimateDiff and MotionLoRA
on several public representative personalized T2I models collected from the
community. The results demonstrate that our approaches help these models
generate temporally smooth animation clips while preserving the visual quality
and motion diversity. Codes and pre-trained weights are available at
https://github.com/guoyww/AnimateDiff.
Related papers
- Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics.
At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z) - EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture [11.587428534308945]
EasyAnimate is an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes.
We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block.
We provide a holistic ecosystem for video production based on DiT, encompassing aspects such as data pre-processing, VAE training, DiT models training, and end-to-end video inference.
arXiv Detail & Related papers (2024-05-29T11:11:07Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - Pix2Gif: Motion-Guided Diffusion for GIF Generation [70.64240654310754]
We present Pix2Gif, a motion-guided diffusion model for image-to-GIF (video) generation.
We propose a new motion-guided warping module to spatially transform the features of the source image conditioned on the two types of prompts.
In preparation for the model training, we meticulously curated data by extracting coherent image frames from the TGIF video-caption dataset.
arXiv Detail & Related papers (2024-03-07T16:18:28Z) - Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models [48.56724784226513]
We propose Customize-A-Video that models the motion from a single reference video and adapts it to new subjects and scenes with both spatial and temporal varieties.
The proposed modules are trained in a staged pipeline and inferred in a plug-and-play fashion, enabling easy extensions to various downstream tasks.
arXiv Detail & Related papers (2024-02-22T18:38:48Z) - Animated Stickers: Bringing Stickers to Life with Video Diffusion [25.81904166775557]
We introduce animated stickers, a video diffusion model which generates an animation conditioned on a text prompt and static image.
Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion.
arXiv Detail & Related papers (2024-02-08T22:49:32Z) - AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff.
For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation.
For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z) - MotionDirector: Motion Customization of Text-to-Video Diffusion Models [24.282240656366714]
Motion Customization aims to adapt existing text-to-video diffusion models to generate videos with customized motion.
We propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion.
Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions.
arXiv Detail & Related papers (2023-10-12T16:26:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.