AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
- URL: http://arxiv.org/abs/2402.00769v1
- Date: Thu, 1 Feb 2024 16:58:11 GMT
- Title: AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
- Authors: Fu-Yun Wang, Zhaoyang Huang, Xiaoyu Shi, Weikang Bian, Guanglu Song,
Yu Liu, Hongsheng Li
- Abstract summary: We propose AnimateLCM, allowing for high-fidelity video generation within minimal steps.
Instead of directly conducting consistency learning on the raw video dataset, we propose a decoupled consistency learning strategy.
We validate the proposed strategy in image-conditioned video generation and layout-conditioned video generation, all achieving top-performing results.
- Score: 47.681633892135125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video diffusion models has been gaining increasing attention for its ability
to produce videos that are both coherent and of high fidelity. However, the
iterative denoising process makes it computationally intensive and
time-consuming, thus limiting its applications. Inspired by the Consistency
Model (CM) that distills pretrained image diffusion models to accelerate the
sampling with minimal steps and its successful extension Latent Consistency
Model (LCM) on conditional image generation, we propose AnimateLCM, allowing
for high-fidelity video generation within minimal steps. Instead of directly
conducting consistency learning on the raw video dataset, we propose a
decoupled consistency learning strategy that decouples the distillation of
image generation priors and motion generation priors, which improves the
training efficiency and enhance the generation visual quality. Additionally, to
enable the combination of plug-and-play adapters in stable diffusion community
to achieve various functions (e.g., ControlNet for controllable generation). we
propose an efficient strategy to adapt existing adapters to our distilled
text-conditioned video consistency model or train adapters from scratch without
harming the sampling speed. We validate the proposed strategy in
image-conditioned video generation and layout-conditioned video generation, all
achieving top-performing results. Experimental results validate the
effectiveness of our proposed method. Code and weights will be made public.
More details are available at https://github.com/G-U-N/AnimateLCM.
Related papers
- NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing [3.6344789837383145]
We propose a video editing framework, NaRCan, which integrates a hybrid deformation field and diffusion prior to generate high-quality natural canonical images.
Our approach utilizes homography to model global motion and employs multi-layer perceptrons (MLPs) to capture local residual deformations.
Our method outperforms existing approaches in various video editing tasks and produces coherent and high-quality edited video sequences.
arXiv Detail & Related papers (2024-06-10T17:59:46Z) - SF-V: Single Forward Video Generation Model [57.292575082410785]
We propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained models.
Experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead.
arXiv Detail & Related papers (2024-06-06T17:58:27Z) - VideoLCM: Video Latent Consistency Model [52.3311704118393]
VideoLCM builds upon existing latent video diffusion models and incorporates consistency distillation techniques for training the latent consistency model.
VideoLCM achieves high-fidelity and smooth video synthesis with only four sampling steps, showcasing the potential for real-time synthesis.
arXiv Detail & Related papers (2023-12-14T16:45:36Z) - DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention
and Text Guidance [73.19191296296988]
We propose a high-fidelity image-to-video generation method by devising a frame retention branch based on a pre-trained video diffusion model, named DreamVideo.
Our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.
arXiv Detail & Related papers (2023-12-05T03:16:31Z) - Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.
We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z) - Control-A-Video: Controllable Text-to-Video Generation with Diffusion
Models [52.512109160994655]
We present a controllable text-to-video (T2V) diffusion model, called Control-A-Video, capable of maintaining consistency while customizable video synthesis.
For the purpose of improving object consistency, Control-A-Video integrates motion priors and content priors into video generation.
Our model achieves resource-efficient convergence and generate consistent and coherent videos with fine-grained control.
arXiv Detail & Related papers (2023-05-23T09:03:19Z) - Video Diffusion Models [47.99413440461512]
Generating temporally coherent high fidelity video is an important milestone in generative modeling research.
We propose a diffusion model for video generation that shows very promising initial results.
We present the first results on a large text-conditioned video generation task, as well as state-of-the-art results on an established unconditional video generation benchmark.
arXiv Detail & Related papers (2022-04-07T14:08:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.