Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers
- URL: http://arxiv.org/abs/2508.16211v1
- Date: Fri, 22 Aug 2025 08:34:03 GMT
- Title: Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers
- Authors: Shikang Zheng, Liang Feng, Xinyu Wang, Qinming Zhou, Peiliang Cai, Chang Zou, Jiacheng Liu, Yuqi Lin, Junjie Chen, Yue Ma, Linfeng Zhang,
- Abstract summary: Diffusion Transformers (DiTs) have demonstrated exceptional performance in high-fidelity image and video generation.<n>Current methods often struggle to maintain generation quality at high acceleration ratios.<n>We propose FoCa, which treats feature caching as a feature-ODE solving problem.
- Score: 19.107716099809707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion Transformers (DiTs) have demonstrated exceptional performance in high-fidelity image and video generation. To reduce their substantial computational costs, feature caching techniques have been proposed to accelerate inference by reusing hidden representations from previous timesteps. However, current methods often struggle to maintain generation quality at high acceleration ratios, where prediction errors increase sharply due to the inherent instability of long-step forecasting. In this work, we adopt an ordinary differential equation (ODE) perspective on the hidden-feature sequence, modeling layer representations along the trajectory as a feature-ODE. We attribute the degradation of existing caching strategies to their inability to robustly integrate historical features under large skipping intervals. To address this, we propose FoCa (Forecast-then-Calibrate), which treats feature caching as a feature-ODE solving problem. Extensive experiments on image synthesis, video generation, and super-resolution tasks demonstrate the effectiveness of FoCa, especially under aggressive acceleration. Without additional training, FoCa achieves near-lossless speedups of 5.50 times on FLUX, 6.45 times on HunyuanVideo, 3.17 times on Inf-DiT, and maintains high quality with a 4.53 times speedup on DiT.
Related papers
- Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration [58.19554276924402]
We propose spectral diffusion feature forecaster (Spectrum) to enable global, long-range feature reuse with tightly controlled error.<n>We achieve up to 4.79$times$ speedup on FLUX.1 and 4.67$times$ speedup on Wan2.1-14B, while maintaining much higher sample quality compared with the baselines.
arXiv Detail & Related papers (2026-03-02T08:59:11Z) - SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching [75.02865981328509]
Caching reduces computation by reusing previously computed model outputs across timesteps.<n>We propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis.<n>SenCache achieves better visual quality than existing caching methods under similar computational budgets.
arXiv Detail & Related papers (2026-02-27T17:36:09Z) - Predict to Skip: Linear Multistep Feature Forecasting for Efficient Diffusion Transformers [10.751183015853863]
Diffusion Transformers (DiT) have emerged as a widely adopted backbone for high-fidelity image and video generation.<n>We propose textbfPrediT, a training-free acceleration framework that formulates feature prediction as a linear multistep problem.<n>Our method achieves up to $5.54times$ latency reduction across various DiT-based image and video generation models, while incurring negligible quality degradation.
arXiv Detail & Related papers (2026-02-20T09:33:59Z) - AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers [37.38708392928324]
Transformer Diffusion (TDis) achieve state-of-the-art in high-fidelity and image generation but suffer from expensive inference due to their iterative denoising.<n>AdaCorrection is an adaptive offset cache correction framework that maintains high generation fidelity while enabling efficient reuse across cache layers during diffusion inference.<n>Our approach achieves strong generation quality with minimal computational overhead, maintaining near-original FID while providing moderate acceleration.
arXiv Detail & Related papers (2026-02-13T08:11:54Z) - Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers [10.215762814937277]
Diffusion Transformers offer state-of-the-art fidelity in image and video synthesis, but their iterative sampling process remains a major bottleneck.<n>To mitigate this, feature caching has emerged as a training-free acceleration technique that reuses or forecasts hidden representations.<n>We introduce HyCa, a Hybrid ODE solver inspired caching framework that applies dimension-wise caching strategies.
arXiv Detail & Related papers (2025-10-05T13:01:08Z) - SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching [17.724549528455317]
Diffusion models have revolutionized high-fidelity image and video synthesis, yet their computational demands remain prohibitive for real-time applications.<n>We present SpeCa, a novel 'Forecast-then-verify' acceleration framework that effectively addresses both limitations.<n>Our approach implements a parameter-free verification mechanism that efficiently evaluates prediction reliability, enabling real-time decisions to accept or reject each prediction.
arXiv Detail & Related papers (2025-09-15T06:46:22Z) - Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation [50.04968365065964]
Diffusion-based talking head models generate high-quality, photorealistic videos but suffer from slow inference.<n>We introduce Lightning-fast Caching-based Parallel denoising prediction (LightningCP)<n>We also propose Decoupled Foreground Attention (DFA) to further accelerate attention computations.
arXiv Detail & Related papers (2025-08-25T02:58:39Z) - Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition [4.0594792247165]
Diffusion transformer (DiT) models have achieved remarkable success in image generation.<n>We propose increment-calibrated caching, a training-free method for DiT acceleration.<n>Our method eliminates more than 45% and improves IS by 12 at the cost of less than 0.06 FID increase.
arXiv Detail & Related papers (2025-05-09T06:56:17Z) - Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models [41.11005178050448]
ProfilingDiT is a novel adaptive caching strategy that explicitly disentangles foreground and background-focused blocks.<n>Our framework achieves significant acceleration while maintaining visual fidelity across comprehensive quality metrics.
arXiv Detail & Related papers (2025-04-04T03:30:15Z) - CacheQuant: Comprehensively Accelerated Diffusion Models [3.78219736760145]
CacheQuant is a novel training-free paradigm that comprehensively accelerates diffusion models by jointly optimizing model caching and quantization techniques.<n> Experimental results show that CacheQuant achieves a 5.18 speedup and 4 compression for Stable Diffusion on MS-COCO, with only a 0.02 loss in CLIP score.
arXiv Detail & Related papers (2025-03-03T09:04:51Z) - Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints [51.83081671798784]
Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability.<n>DiT's practical application suffers from inherent dynamic feature instability, leading to error amplification during cached inference.<n>We propose Skip-DiT, an image and video generative DiT variant enhanced with Long-Skip-Connections (LSCs) - the key efficiency component in U-Nets.
arXiv Detail & Related papers (2024-11-26T17:28:10Z) - FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality [58.80996741843102]
FasterCache is a training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation.<n>We show that FasterCache can significantly accelerate video generation while keeping video quality comparable to the baseline.
arXiv Detail & Related papers (2024-10-25T07:24:38Z) - DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture.
DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.