BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching
- URL: http://arxiv.org/abs/2509.13789v2
- Date: Thu, 18 Sep 2025 04:57:32 GMT
- Title: BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching
- Authors: Hanshuai Cui, Zhiqing Tang, Zhifei Xu, Zhi Yao, Wenyi Zeng, Weijia Jia,
- Abstract summary: Block-Wise Caching (BWCache) is a training-free method to accelerate DiT-based video generation.<n> experiments on several video diffusion models demonstrate that BWCache achieves up to 2.24$times$ speedup with comparable visual quality.
- Score: 6.354675628412448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in Diffusion Transformers (DiTs) have established them as the state-of-the-art method for video generation. However, their inherently sequential denoising process results in inevitable latency, limiting real-world applicability. Existing acceleration methods either compromise visual quality due to architectural modifications or fail to reuse intermediate features at proper granularity. Our analysis reveals that DiT blocks are the primary contributors to inference latency. Across diffusion timesteps, the feature variations of DiT blocks exhibit a U-shaped pattern with high similarity during intermediate timesteps, which suggests substantial computational redundancy. In this paper, we propose Block-Wise Caching (BWCache), a training-free method to accelerate DiT-based video generation. BWCache dynamically caches and reuses features from DiT blocks across diffusion timesteps. Furthermore, we introduce a similarity indicator that triggers feature reuse only when the differences between block features at adjacent timesteps fall below a threshold, thereby minimizing redundant computations while maintaining visual fidelity. Extensive experiments on several video diffusion models demonstrate that BWCache achieves up to 2.24$\times$ speedup with comparable visual quality.
Related papers
- Flow caching for autoregressive video generation [72.10021661412364]
We present FlowCache, the first caching framework specifically designed for autoregressive video generation.<n>Our method achieves remarkable speedups of 2.38 times on MAGI-1 and 6.7 times on SkyReels-V2, with negligible quality degradation.
arXiv Detail & Related papers (2026-02-11T13:11:04Z) - FlashBlock: Attention Caching for Efficient Long-Context Block Diffusion [51.1618564189244]
FlashBlock is a cached block-external attention mechanism that reuses stable attention output, reducing attention computation and KV cache access without modifying the diffusion process.<n> Experiments on diffusion language models and video generation demonstrate up to 1.44$times$ higher token throughput and up to 1.6$times$ reduction in attention time, with negligible impact on generation quality.
arXiv Detail & Related papers (2026-02-05T04:57:21Z) - MixCache: Mixture-of-Cache for Video Diffusion Transformer Acceleration [15.22288174114487]
Caching is a widely adopted optimization method in DiT models.<n>We propose MixCache, a training-free caching-based framework for efficient video DiT inference.
arXiv Detail & Related papers (2025-08-18T07:49:33Z) - Sortblock: Similarity-Aware Feature Reuse for Diffusion Model [9.749736545966694]
Diffusion Transformers (DiTs) have demonstrated remarkable generative capabilities.<n>DiTs' sequential denoising process results in high inference latency.<n>We propose Sortblock, a training-free inference acceleration framework.
arXiv Detail & Related papers (2025-08-01T08:10:54Z) - Block-wise Adaptive Caching for Accelerating Diffusion Policy [10.641633189595302]
Block-wise Adaptive Caching(BAC) is a method to accelerate Diffusion Policy by caching intermediate action features.<n>BAC achieves up to 3x inference speedup for free on robotic benchmarks.
arXiv Detail & Related papers (2025-06-16T13:14:58Z) - QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [84.91431271257437]
Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation.<n>DiTs come with significant drawbacks, including increased computational and memory costs.<n>We propose QuantCache, a novel training-free inference acceleration framework.
arXiv Detail & Related papers (2025-03-09T10:31:51Z) - Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints [51.83081671798784]
Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability.<n>DiT's practical application suffers from inherent dynamic feature instability, leading to error amplification during cached inference.<n>We propose Skip-DiT, an image and video generative DiT variant enhanced with Long-Skip-Connections (LSCs) - the key efficiency component in U-Nets.
arXiv Detail & Related papers (2024-11-26T17:28:10Z) - SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers [4.7170474122879575]
Diffusion Transformers (DiT) have emerged as powerful generative models for various tasks, including image, video, and speech synthesis.<n>We introduce SmoothCache, a model-agnostic inference acceleration technique for DiT architectures.<n>Our experiments demonstrate that SmoothCache achieves 71% 8% to speed up while maintaining or even improving generation quality across diverse modalities.
arXiv Detail & Related papers (2024-11-15T16:24:02Z) - Adaptive Caching for Faster Video Generation with Diffusion Transformers [52.73348147077075]
Diffusion Transformers (DiTs) rely on larger models and heavier attention mechanisms, resulting in slower inference speeds.
We introduce a training-free method to accelerate video DiTs, termed Adaptive Caching (AdaCache)
We also introduce a Motion Regularization (MoReg) scheme to utilize video information within AdaCache, controlling the compute allocation based on motion content.
arXiv Detail & Related papers (2024-11-04T18:59:44Z) - FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality [58.80996741843102]
FasterCache is a training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation.<n>We show that FasterCache can significantly accelerate video generation while keeping video quality comparable to the baseline.
arXiv Detail & Related papers (2024-10-25T07:24:38Z) - Cache Me if You Can: Accelerating Diffusion Models through Block Caching [67.54820800003375]
A large image-to-image network has to be applied many times to iteratively refine an image from random noise.
We investigate the behavior of the layers within the network and find that 1) the layers' output changes smoothly over time, 2) the layers show distinct patterns of change, and 3) the change from step to step is often very small.
We propose a technique to automatically determine caching schedules based on each block's changes over timesteps.
arXiv Detail & Related papers (2023-12-06T00:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.