FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
- URL: http://arxiv.org/abs/2410.19355v1
- Date: Fri, 25 Oct 2024 07:24:38 GMT
- Title: FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
- Authors: Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong,
- Abstract summary: FasterCache is a training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation.
We show that FasterCache can significantly accelerate video generation while keeping video quality comparable to the baseline.
- Score: 58.80996741843102
- License:
- Abstract: In this paper, we present \textbf{\textit{FasterCache}}, a novel training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation. By analyzing existing cache-based methods, we observe that \textit{directly reusing adjacent-step features degrades video quality due to the loss of subtle variations}. We further perform a pioneering investigation of the acceleration potential of classifier-free guidance (CFG) and reveal significant redundancy between conditional and unconditional features within the same timestep. Capitalizing on these observations, we introduce FasterCache to substantially accelerate diffusion-based video generation. Our key contributions include a dynamic feature reuse strategy that preserves both feature distinction and temporal continuity, and CFG-Cache which optimizes the reuse of conditional and unconditional outputs to further enhance inference speed without compromising video quality. We empirically evaluate FasterCache on recent video diffusion models. Experimental results show that FasterCache can significantly accelerate video generation (\eg 1.67$\times$ speedup on Vchitect-2.0) while keeping video quality comparable to the baseline, and consistently outperform existing methods in both inference speed and video quality.
Related papers
- Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing [66.66090399385304]
Ca2-VDM is an efficient autoregressive VDM with Causal generation and Cache sharing.
For causal generation, it introduces unidirectional feature computation, which ensures that the cache of conditional frames can be precomputed in previous autoregression steps.
For cache sharing, it shares the cache across all denoising steps to avoid the huge cache storage cost.
arXiv Detail & Related papers (2024-11-25T13:33:41Z) - SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers [4.7170474122879575]
Diffusion Transformers (DiT) have emerged as powerful generative models for various tasks, including image, video, and speech synthesis.
We introduce SmoothCache, a model-agnostic inference acceleration technique for DiT architectures.
Our experiments demonstrate that SmoothCache achieves 71% 8% to speed up while maintaining or even improving generation quality across diverse modalities.
arXiv Detail & Related papers (2024-11-15T16:24:02Z) - Adaptive Caching for Faster Video Generation with Diffusion Transformers [52.73348147077075]
Diffusion Transformers (DiTs) rely on larger models and heavier attention mechanisms, resulting in slower inference speeds.
We introduce a training-free method to accelerate video DiTs, termed Adaptive Caching (AdaCache)
We also introduce a Motion Regularization (MoReg) scheme to utilize video information within AdaCache, controlling the compute allocation based on motion content.
arXiv Detail & Related papers (2024-11-04T18:59:44Z) - Fast and Memory-Efficient Video Diffusion Using Streamlined Inference [41.505829393818274]
Current video diffusion models exhibit demanding computational requirements and high peak memory usage.
We present Streamlined Inference, which leverages the temporal and spatial properties of video diffusion models.
Our approach significantly reduces peak memory and computational overhead, making it feasible to generate high-quality videos on a single consumer GPU.
arXiv Detail & Related papers (2024-11-02T07:52:18Z) - Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World
Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling.
It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences.
It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z) - DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture.
DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z) - Deep Unsupervised Key Frame Extraction for Efficient Video
Classification [63.25852915237032]
This work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC)
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification.
arXiv Detail & Related papers (2022-11-12T20:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.