AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
- URL: http://arxiv.org/abs/2402.00769v1
- Date: Thu, 1 Feb 2024 16:58:11 GMT
- Title: AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
- Authors: Fu-Yun Wang, Zhaoyang Huang, Xiaoyu Shi, Weikang Bian, Guanglu Song,
Yu Liu, Hongsheng Li
- Abstract summary: We propose AnimateLCM, allowing for high-fidelity video generation within minimal steps.
Instead of directly conducting consistency learning on the raw video dataset, we propose a decoupled consistency learning strategy.
We validate the proposed strategy in image-conditioned video generation and layout-conditioned video generation, all achieving top-performing results.
- Score: 47.681633892135125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video diffusion models has been gaining increasing attention for its ability
to produce videos that are both coherent and of high fidelity. However, the
iterative denoising process makes it computationally intensive and
time-consuming, thus limiting its applications. Inspired by the Consistency
Model (CM) that distills pretrained image diffusion models to accelerate the
sampling with minimal steps and its successful extension Latent Consistency
Model (LCM) on conditional image generation, we propose AnimateLCM, allowing
for high-fidelity video generation within minimal steps. Instead of directly
conducting consistency learning on the raw video dataset, we propose a
decoupled consistency learning strategy that decouples the distillation of
image generation priors and motion generation priors, which improves the
training efficiency and enhance the generation visual quality. Additionally, to
enable the combination of plug-and-play adapters in stable diffusion community
to achieve various functions (e.g., ControlNet for controllable generation). we
propose an efficient strategy to adapt existing adapters to our distilled
text-conditioned video consistency model or train adapters from scratch without
harming the sampling speed. We validate the proposed strategy in
image-conditioned video generation and layout-conditioned video generation, all
achieving top-performing results. Experimental results validate the
effectiveness of our proposed method. Code and weights will be made public.
More details are available at https://github.com/G-U-N/AnimateLCM.
Related papers
- Subject-driven Video Generation via Disentangled Identity and Motion [52.54835936914813]
We propose to train a subject-driven customized video generation model through decoupling the subject-specific learning from temporal dynamics in zero-shot without additional tuning.
Our method achieves strong subject consistency and scalability, outperforming existing video customization models in zero-shot settings.
arXiv Detail & Related papers (2025-04-23T06:48:31Z) - Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning [71.94122309290537]
We propose an efficient, online approach to generate dense captions for videos.
Our model uses a novel autoregressive factorized decoding architecture.
Our approach shows excellent performance compared to both offline and online methods, and uses 20% less compute.
arXiv Detail & Related papers (2024-11-22T02:46:44Z) - Movie Gen: A Cast of Media Foundation Models [133.41504332082667]
We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio.
We show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image.
arXiv Detail & Related papers (2024-10-17T16:22:46Z) - DreamVideo: Composing Your Dream Videos with Customized Subject and
Motion [52.7394517692186]
We present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject.
DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model.
In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern.
arXiv Detail & Related papers (2023-12-07T16:57:26Z) - InstructVid2Vid: Controllable Video Editing with Natural Language Instructions [97.17047888215284]
InstructVid2Vid is an end-to-end diffusion-based methodology for video editing guided by human language instructions.
Our approach empowers video manipulation guided by natural language directives, eliminating the need for per-example fine-tuning or inversion.
arXiv Detail & Related papers (2023-05-21T03:28:13Z) - Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style
Transfer [13.098901971644656]
This paper proposes a zero-shot video stylization method named Style-A-Video.
Uses a generative pre-trained transformer with an image latent diffusion model to achieve a concise text-controlled video stylization.
Tests show that we can attain superior content preservation and stylistic performance while incurring less consumption than previous solutions.
arXiv Detail & Related papers (2023-05-09T14:03:27Z) - Video Generation Beyond a Single Clip [76.5306434379088]
Video generation models can only generate video clips that are relatively short compared with the length of real videos.
To generate long videos covering diverse content and multiple events, we propose to use additional guidance to control the video generation process.
The proposed approach is complementary to existing efforts on video generation, which focus on generating realistic video within a fixed time window.
arXiv Detail & Related papers (2023-04-15T06:17:30Z) - Diverse Generation from a Single Video Made Possible [24.39972895902724]
We present a fast and practical method for video generation and manipulation from a single natural video.
Our method generates more realistic and higher quality results than single-video GANs.
arXiv Detail & Related papers (2021-09-17T15:12:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.