Related papers: ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration

ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration

URL: http://arxiv.org/abs/2512.17298v1
Date: Fri, 19 Dec 2025 07:27:19 GMT
Title: ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration
Authors: Fanpu Cao, Yaofo Chen, Zeng You, Wei Luo, Cen Chen,
Abstract summary: Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment.<n>Existing methods suffer from two key limitations: (1) uniform caching intervals fail to align with the non-uniform temporal dynamics of DiT, and (2) naive feature reuse with excessively large caching intervals can lead to severe error accumulation.<n>We propose ProCache, a training-free dynamic feature caching framework that addresses these issues via two core components.
Score: 14.306565517230775
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment. While feature caching offers a promising training-free acceleration solution by exploiting temporal redundancy, existing methods suffer from two key limitations: (1) uniform caching intervals fail to align with the non-uniform temporal dynamics of DiT, and (2) naive feature reuse with excessively large caching intervals can lead to severe error accumulation. In this work, we analyze the evolution of DiT features during denoising and reveal that both feature changes and error propagation are highly time- and depth-varying. Motivated by this, we propose ProCache, a training-free dynamic feature caching framework that addresses these issues via two core components: (i) a constraint-aware caching pattern search module that generates non-uniform activation schedules through offline constrained sampling, tailored to the model's temporal characteristics; and (ii) a selective computation module that selectively computes within deep blocks and high-importance tokens for cached segments to mitigate error accumulation with minimal overhead. Extensive experiments on PixArt-alpha and DiT demonstrate that ProCache achieves up to 1.96x and 2.90x acceleration with negligible quality degradation, significantly outperforming prior caching-based methods.

Related papers

PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation [35.47114707080758]
High computational costs and slow inference hinder the practical application of video generation models.<n>We propose textbfPreciseCache, a plug-and-play framework that precisely detects and skips truly redundant computations.
arXiv Detail & Related papers (2026-03-01T08:08:49Z)
SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching [75.02865981328509]
Caching reduces computation by reusing previously computed model outputs across timesteps.<n>We propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis.<n>SenCache achieves better visual quality than existing caching methods under similar computational budgets.
arXiv Detail & Related papers (2026-02-27T17:36:09Z)
H2-Cache: A Novel Hierarchical Dual-Stage Cache for High-Performance Acceleration of Generative Diffusion Models [7.8812023976358425]
H2-cache is a novel hierarchical caching mechanism designed for modern generative diffusion model architectures.<n>Our method is founded on the key insight that the denoising process can be functionally separated into a structure-defining stage and a detail-refining stage.<n>Experiments on the Flux architecture demonstrate that H2-cache achieves significant acceleration (up to 5.08x) while maintaining image quality nearly identical to the baseline.
arXiv Detail & Related papers (2025-10-31T04:47:14Z)
ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion [30.897215456167753]
Diffusion models suffer from substantial computational overhead due to their inherently iterative inference process.<n>We propose ERTACache, a principled caching framework that jointly rectifies both error types.<n>ERTACache achieves up to 2x inference speedup while consistently preserving or even improving visual quality.
arXiv Detail & Related papers (2025-08-27T10:37:24Z)
DiCache: Let Diffusion Model Determine Its Own Cache [62.954717254728166]
DiCache is a training-free adaptive caching strategy for accelerating diffusion models at runtime.<n>Online Probe Profiling Scheme leverages a shallow-layer online probe to obtain an on-the-fly indicator for the caching error in real time.<n> Dynamic Cache Trajectory Alignment approximates the deep-layer feature output from multi-step historical caches.
arXiv Detail & Related papers (2025-08-24T13:30:00Z)
PromptTea: Let Prompts Tell TeaCache the Optimal Threshold [1.0665410339553834]
A common acceleration strategy involves reusing model outputs via caching mechanisms at fixed intervals.<n>We propose Prompt-Complexity-Aware (PCA) caching, a method that automatically adjusts reuse thresholds based on scene complexity estimated directly from the input prompt.
arXiv Detail & Related papers (2025-07-09T10:53:05Z)
FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [43.83288560196838]
Diffusion Transformers (DiT) are powerful generative models but remain computationally intensive due to their iterative structure and deep transformer stacks.<n>FastCache is a hidden-state-level caching and compression framework that accelerates DiT inference.<n> Empirical evaluations across multiple DiT variants demonstrate substantial reductions in latency and memory usage.
arXiv Detail & Related papers (2025-05-26T05:58:49Z)
QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [84.91431271257437]
Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation.<n>DiTs come with significant drawbacks, including increased computational and memory costs.<n>We propose QuantCache, a novel training-free inference acceleration framework.
arXiv Detail & Related papers (2025-03-09T10:31:51Z)
Temporal Feature Matters: A Framework for Diffusion Model Quantization [105.3033493564844]
Diffusion models rely on the time-step for the multi-round denoising.<n>We introduce a novel quantization framework that includes three strategies.<n>This framework preserves most of the temporal information and ensures high-quality end-to-end generation.
arXiv Detail & Related papers (2024-07-28T17:46:15Z)
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters. We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z)
DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture. DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models. Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.