Related papers: FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

URL: http://arxiv.org/abs/2505.20353v1
Date: Mon, 26 May 2025 05:58:49 GMT
Title: FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation
Authors: Dong Liu, Jiayi Zhang, Yifan Li, Yanxuan Yu, Ben Lengerich, Ying Nian Wu,
Abstract summary: Diffusion Transformers (DiT) are powerful generative models but remain computationally intensive due to their iterative structure and deep transformer stacks.<n>FastCache is a hidden-state-level caching and compression framework that accelerates DiT inference.<n> Empirical evaluations across multiple DiT variants demonstrate substantial reductions in latency and memory usage.
Score: 46.57781555466333
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion Transformers (DiT) are powerful generative models but remain computationally intensive due to their iterative structure and deep transformer stacks. To alleviate this inefficiency, we propose FastCache, a hidden-state-level caching and compression framework that accelerates DiT inference by exploiting redundancy within the model's internal representations. FastCache introduces a dual strategy: (1) a spatial-aware token selection mechanism that adaptively filters redundant tokens based on hidden state saliency, and (2) a transformer-level cache that reuses latent activations across timesteps when changes are statistically insignificant. These modules work jointly to reduce unnecessary computation while preserving generation fidelity through learnable linear approximation. Theoretical analysis shows that FastCache maintains bounded approximation error under a hypothesis-testing-based decision rule. Empirical evaluations across multiple DiT variants demonstrate substantial reductions in latency and memory usage, with best generation output quality compared to other cache methods, as measured by FID and t-FID. Code implementation of FastCache is available on GitHub at https://github.com/NoakLiu/FastCache-xDiT.

Related papers

MagCache: Fast Video Generation with Magnitude-Aware Cache [91.51242917160373]
We introduce a novel and robust discovery: a unified magnitude law observed across different models and prompts.<n>We introduce a Magnitude-aware Cache (MagCache) that adaptively skips unimportant timesteps using an error modeling mechanism and adaptive caching strategy.<n> Experimental results show that MagCache achieves 2.1x and 2.68x speedups on Open-Sora and Wan 2.1, respectively.
arXiv Detail & Related papers (2025-06-10T17:59:02Z)
dKV-Cache: The Cache for Diffusion Language Models [53.85291644298835]
Diffusion Language Models (DLMs) have been seen as a promising competitor for autoregressive language models.<n>We propose a KV-cache-like mechanism, delayed KV-Cache, for the denoising process of DLMs.<n>Our approach is motivated by the observation that different tokens have distinct representation dynamics throughout the diffusion process.
arXiv Detail & Related papers (2025-05-21T17:32:10Z)
QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [84.91431271257437]
Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation.<n>DiTs come with significant drawbacks, including increased computational and memory costs.<n>We propose QuantCache, a novel training-free inference acceleration framework.
arXiv Detail & Related papers (2025-03-09T10:31:51Z)
Accelerating Diffusion Transformers with Dual Feature Caching [25.36988865752475]
Diffusion Transformers (DiT) have become the dominant methods in image and video generation yet still suffer substantial computational costs.<n>As an effective approach for DiT acceleration, feature caching methods are designed to cache the features of DiT in previous timesteps.<n> aggressively reusing all the features cached in previous timesteps leads to a severe drop in generation quality.
arXiv Detail & Related papers (2024-12-25T14:00:14Z)
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers [4.7170474122879575]
Diffusion Transformers (DiT) have emerged as powerful generative models for various tasks, including image, video, and speech synthesis.<n>We introduce SmoothCache, a model-agnostic inference acceleration technique for DiT architectures.<n>Our experiments demonstrate that SmoothCache achieves 71% 8% to speed up while maintaining or even improving generation quality across diverse modalities.
arXiv Detail & Related papers (2024-11-15T16:24:02Z)
Token Caching for Diffusion Transformer Acceleration [30.437462937127773]
TokenCache is a novel post-training acceleration method for diffusion transformers. It reduces redundant computations among tokens across inference steps. We show that TokenCache achieves an effective trade-off between generation quality and inference speed for diffusion transformers.
arXiv Detail & Related papers (2024-09-27T08:05:34Z)
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters. We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z)
DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture. DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models. Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.