Related papers: Frequency-Aware Error-Bounded Caching for Accelerating Diffusion Transformers

Frequency-Aware Error-Bounded Caching for Accelerating Diffusion Transformers

URL: http://arxiv.org/abs/2603.05315v1
Date: Thu, 05 Mar 2026 15:58:06 GMT
Title: Frequency-Aware Error-Bounded Caching for Accelerating Diffusion Transformers
Authors: Guandong Li,
Abstract summary: Diffusion Transformers (DiTs) have emerged as the dominant architecture for high-quality image and video generation.<n>Existing caching methods accelerate DiTs by reusing intermediate computations across timesteps, but they share a common limitation: treating the denoising process as uniform across time,depth, and feature dimensions.<n>We propose SpectralCache, a unified caching framework comprising Timestep-Aware Dynamic Scheduling (TADS), Cumulative Error Budgets (CEB), and Frequency-Decomposed Caching (FDC)
Score: 11.772150619675527
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Transformers (DiTs) have emerged as the dominant architecture for high-quality image and video generation, yet their iterative denoising process incurs substantial computational cost during inference. Existing caching methods accelerate DiTs by reusing intermediate computations across timesteps, but they share a common limitation: treating the denoising process as uniform across time,depth, and feature dimensions. In this work, we identify three orthogonal axes of non-uniformity in DiT denoising: (1) temporal -- sensitivity to caching errors varies dramatically across the denoising trajectory; (2) depth -- consecutive caching decisions lead to cascading approximation errors; and (3) feature -- different components of the hidden state exhibit heterogeneous temporal dynamics. Based on these observations, we propose SpectralCache, a unified caching framework comprising Timestep-Aware Dynamic Scheduling (TADS), Cumulative Error Budgets (CEB), and Frequency-Decomposed Caching (FDC). On FLUX.1-schnell at 512x512 resolution, SpectralCache achieves 2.46x speedup with LPIPS 0.217 and SSIM 0.727, outperforming TeaCache (2.12x, LPIPS 0.215, SSIM 0.734) by 16% in speed while maintaining comparable quality (LPIPS difference < 1%). Our approach is training-free, plug-and-play, and compatible with existing DiT architectures.

Related papers

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching [75.02865981328509]
Caching reduces computation by reusing previously computed model outputs across timesteps.<n>We propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis.<n>SenCache achieves better visual quality than existing caching methods under similar computational budgets.
arXiv Detail & Related papers (2026-02-27T17:36:09Z)
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers [6.406853903837331]
Diffusion Transformers (DiTs) have achieved state-of-the-art performance in image and video generation, but their success comes at the cost of heavy computation.<n>We propose dynamic tokenization, an efficient test-time strategy that varies patch sizes based on content complexity and the denoising timestep.<n>During inference, our method dynamically reallocates patch sizes across denoising steps for image and video generation and substantially reduces cost while preserving perceptual generation quality.
arXiv Detail & Related papers (2026-02-19T00:15:20Z)
ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration [14.306565517230775]
Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment.<n>Existing methods suffer from two key limitations: (1) uniform caching intervals fail to align with the non-uniform temporal dynamics of DiT, and (2) naive feature reuse with excessively large caching intervals can lead to severe error accumulation.<n>We propose ProCache, a training-free dynamic feature caching framework that addresses these issues via two core components.
arXiv Detail & Related papers (2025-12-19T07:27:19Z)
Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration [16.87269278147738]
We propose Fast3Dcache, a training-free geometry-aware caching framework for 3D diffusion inference.<n>Our method achieves up to a 27.12% speed-up and a 54.8% reduction in FLOPs, with minimal degradation in geometric quality as measured by Chamfer Distance (2.48%) and F-Score (1.95%)
arXiv Detail & Related papers (2025-11-27T15:13:32Z)
Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation [50.04968365065964]
Diffusion-based talking head models generate high-quality, photorealistic videos but suffer from slow inference.<n>We introduce Lightning-fast Caching-based Parallel denoising prediction (LightningCP)<n>We also propose Decoupled Foreground Attention (DFA) to further accelerate attention computations.
arXiv Detail & Related papers (2025-08-25T02:58:39Z)
TaoCache: Structure-Maintained Video Generation Acceleration [4.594224594572109]
We present TaoCache, a training-free, plug-and-play caching strategy for video diffusion models.<n>It adopts a fixed-point perspective to predict the model's noise output and is specifically effective in late denoising stages.
arXiv Detail & Related papers (2025-08-12T14:40:36Z)
Speedy MASt3R [68.47052557089631]
MASt3R redefines image matching as a 3D task by leveraging DUSt3R and introducing a fast reciprocal matching scheme.<n>Fast MASt3R achieves a 54% reduction in inference time (198 ms to 91 ms per image pair) without sacrificing accuracy.<n>This advancement enables real-time 3D understanding, benefiting applications like mixed reality navigation and large-scale 3D scene reconstruction.
arXiv Detail & Related papers (2025-03-13T03:56:22Z)
Accelerating Diffusion Transformer via Gradient-Optimized Cache [18.32157920050325]
Feature caching has emerged as an effective strategy to accelerate diffusion transformer (DiT) sampling through temporal feature reuse.<n>It is a challenging problem since Progressive error accumulation from cached blocks significantly degrades generation quality.<n>We propose the Gradient- perturbationd Cache (GOC) with two key innovations.<n>GOC achieves IS 216.28 (26.3% higher) and FID 3.907 (43% lower) compared to baseline DiT, while maintaining identical computational costs.
arXiv Detail & Related papers (2025-03-07T05:31:47Z)
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters. We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z)
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation [52.56469577812338]
We introduce StreamDiffusion, a real-time diffusion pipeline for interactive image generation.<n>Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction.<n>We present a novel approach that transforms the original sequential denoising into the denoising process.
arXiv Detail & Related papers (2023-12-19T18:18:33Z)
Cache Me if You Can: Accelerating Diffusion Models through Block Caching [67.54820800003375]
A large image-to-image network has to be applied many times to iteratively refine an image from random noise. We investigate the behavior of the layers within the network and find that 1) the layers' output changes smoothly over time, 2) the layers show distinct patterns of change, and 3) the change from step to step is often very small. We propose a technique to automatically determine caching schedules based on each block's changes over timesteps.
arXiv Detail & Related papers (2023-12-06T00:51:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.