Related papers: Accelerating Diffusion Transformers with Dual Feature Caching

Accelerating Diffusion Transformers with Dual Feature Caching

URL: http://arxiv.org/abs/2412.18911v1
Date: Wed, 25 Dec 2024 14:00:14 GMT
Title: Accelerating Diffusion Transformers with Dual Feature Caching
Authors: Chang Zou, Evelyn Zhang, Runlin Guo, Haohang Xu, Conghui He, Xuming Hu, Linfeng Zhang,
Abstract summary: Diffusion Transformers (DiT) have become the dominant methods in image and video generation yet still suffer substantial computational costs.<n>As an effective approach for DiT acceleration, feature caching methods are designed to cache the features of DiT in previous timesteps.<n> aggressively reusing all the features cached in previous timesteps leads to a severe drop in generation quality.
Score: 25.36988865752475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Transformers (DiT) have become the dominant methods in image and video generation yet still suffer substantial computational costs. As an effective approach for DiT acceleration, feature caching methods are designed to cache the features of DiT in previous timesteps and reuse them in the next timesteps, allowing us to skip the computation in the next timesteps. However, on the one hand, aggressively reusing all the features cached in previous timesteps leads to a severe drop in generation quality. On the other hand, conservatively caching only the features in the redundant layers or tokens but still computing the important ones successfully preserves the generation quality but results in reductions in acceleration ratios. Observing such a tradeoff between generation quality and acceleration performance, this paper begins by quantitatively studying the accumulated error from cached features. Surprisingly, we find that aggressive caching does not introduce significantly more caching errors in the caching step, and the conservative feature caching can fix the error introduced by aggressive caching. Thereby, we propose a dual caching strategy that adopts aggressive and conservative caching iteratively, leading to significant acceleration and high generation quality at the same time. Besides, we further introduce a V-caching strategy for token-wise conservative caching, which is compatible with flash attention and requires no training and calibration data. Our codes have been released in Github: \textbf{Code: \href{https://github.com/Shenyi-Z/DuCa}{\texttt{\textcolor{cyan}{https://github.com/Shenyi-Z/DuCa}}}}

Related papers

Exposure Bias Reduction for Enhancing Diffusion Transformer Feature Caching [7.393824353099595]
Diffusion Transformer (DiT) has exhibited impressive generation capabilities but faces great challenges due to its high computational complexity. We analyze the impact of caching on the SNR of the diffusion process. We introduce EB-Cache, a joint cache strategy that aligns the Non-exposure bias.
arXiv Detail & Related papers (2025-03-10T09:49:18Z)
QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [84.91431271257437]
Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation. DiTs come with significant drawbacks, including increased computational and memory costs. We propose QuantCache, a novel training-free inference acceleration framework.
arXiv Detail & Related papers (2025-03-09T10:31:51Z)
Accelerating Diffusion Transformer via Gradient-Optimized Cache [18.32157920050325]
Progressive error accumulation from cached blocks significantly degrades generation quality. Current error compensation approaches neglect dynamic patterns during the caching process, leading to suboptimal error correction. We propose the Gradient-lectiond Cache (GOC) with two key innovations. GOC achieves IS 216.28 (26.3% higher) and FID 3.907 (43% lower) compared to baseline DiT, while maintaining identical computational costs.
arXiv Detail & Related papers (2025-03-07T05:31:47Z)
Accelerating Diffusion Transformer via Error-Optimized Cache [17.666577782052205]
Diffusion Transformer (DiT) is a crucial method for content generation. Existing caching methods accelerate generation by reusing DiT features from the previous time step and skipping calculations in the next. To solve this problem, we propose the Error-d Cache (EOC)
arXiv Detail & Related papers (2025-01-31T15:58:15Z)
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing [66.66090399385304]
Ca2-VDM is an efficient autoregressive VDM with Causal generation and Cache sharing. For causal generation, it introduces unidirectional feature computation, which ensures that the cache of conditional frames can be precomputed in previous autoregression steps. For cache sharing, it shares the cache across all denoising steps to avoid the huge cache storage cost.
arXiv Detail & Related papers (2024-11-25T13:33:41Z)
Adaptive Caching for Faster Video Generation with Diffusion Transformers [52.73348147077075]
Diffusion Transformers (DiTs) rely on larger models and heavier attention mechanisms, resulting in slower inference speeds. We introduce a training-free method to accelerate video DiTs, termed Adaptive Caching (AdaCache) We also introduce a Motion Regularization (MoReg) scheme to utilize video information within AdaCache, controlling the compute allocation based on motion content.
arXiv Detail & Related papers (2024-11-04T18:59:44Z)
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality [58.80996741843102]
FasterCache is a training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation. We show that FasterCache can significantly accelerate video generation while keeping video quality comparable to the baseline.
arXiv Detail & Related papers (2024-10-25T07:24:38Z)
Accelerating Diffusion Transformers with Token-wise Feature Caching [19.140800616594294]
Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs.<n>We introduce token-wise feature caching, allowing us to adaptively select the most suitable tokens for caching.<n>Experiments on PixArt-$alpha$, OpenSora, and DiT demonstrate our effectiveness in both image and video generation with no requirements for training.
arXiv Detail & Related papers (2024-10-05T03:47:06Z)
Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models. We propose an importance-driven cache merging strategy to prune redundancy caches. For instruction encoding, we utilize the frequency to evaluate the importance of caches. Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z)
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters. We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.