Related papers: FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching

FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching

URL: http://arxiv.org/abs/2510.08669v1
Date: Thu, 09 Oct 2025 17:22:23 GMT
Title: FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching
Authors: Jiacheng Liu, Peiliang Cai, Qinming Zhou, Yuqi Lin, Deyang Kong, Benhao Huang, Yupei Pan, Haowen Xu, Chang Zou, Junshu Tang, Shikang Zheng, Linfeng Zhang,
Abstract summary: We show that different frequency bands in the features of diffusion models exhibit different dynamics across timesteps.<n>We propose Frequency-aware Caching (FreqCa) which directly reuses features of low-frequency components based on their similarity.<n>We also propose to cache Cumulative Residual Feature (CRF) instead of the features in all the layers, which reduces the memory footprint of feature caching by 99%.
Score: 13.999620910665612
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The application of diffusion transformers is suffering from their significant inference costs. Recently, feature caching has been proposed to solve this problem by reusing features from previous timesteps, thereby skipping computation in future timesteps. However, previous feature caching assumes that features in adjacent timesteps are similar or continuous, which does not always hold in all settings. To investigate this, this paper begins with an analysis from the frequency domain, which reveal that different frequency bands in the features of diffusion models exhibit different dynamics across timesteps. Concretely, low-frequency components, which decide the structure of images, exhibit higher similarity but poor continuity. In contrast, the high-frequency bands, which decode the details of images, show significant continuity but poor similarity. These interesting observations motivate us to propose Frequency-aware Caching (FreqCa) which directly reuses features of low-frequency components based on their similarity, while using a second-order Hermite interpolator to predict the volatile high-frequency ones based on its continuity. Besides, we further propose to cache Cumulative Residual Feature (CRF) instead of the features in all the layers, which reduces the memory footprint of feature caching by 99%. Extensive experiments on FLUX.1-dev, FLUX.1-Kontext-dev, Qwen-Image, and Qwen-Image-Edit demonstrate its effectiveness in both generation and editing. Codes are available in the supplementary materials and will be released on GitHub.

Related papers

Frequency-Aware Error-Bounded Caching for Accelerating Diffusion Transformers [11.772150619675527]
Diffusion Transformers (DiTs) have emerged as the dominant architecture for high-quality image and video generation.<n>Existing caching methods accelerate DiTs by reusing intermediate computations across timesteps, but they share a common limitation: treating the denoising process as uniform across time,depth, and feature dimensions.<n>We propose SpectralCache, a unified caching framework comprising Timestep-Aware Dynamic Scheduling (TADS), Cumulative Error Budgets (CEB), and Frequency-Decomposed Caching (FDC)
arXiv Detail & Related papers (2026-03-05T15:58:06Z)
SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching [75.02865981328509]
Caching reduces computation by reusing previously computed model outputs across timesteps.<n>We propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis.<n>SenCache achieves better visual quality than existing caching methods under similar computational budgets.
arXiv Detail & Related papers (2026-02-27T17:36:09Z)
SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models [41.7269767513774]
We introduce Spectral-Evolution-Aware Cache (SeaCache), a training-free cache schedule that reuses decisions on a spectrally aligned representation.<n>Through theoretical and empirical analysis, we derive a Spectral-Evolution-Aware filter that preserves content-relevant components while suppressing noise.<n>Experiments on diverse visual generative models and the baselines show that SeaCache achieves state-of-the-art latency-quality trade-offs.
arXiv Detail & Related papers (2026-02-22T00:48:03Z)
Compute Only 16 Tokens in One Timestep: Accelerating Diffusion Transformers with Cluster-Driven Feature Caching [11.75972316736487]
We introduce Cluster-Driven Feature Caching (ClusCa) to accelerate diffusion transformers.<n>ClusCa performs spatial clustering on tokens in each timestep, computes only one token in each cluster and propagates their information to all the other tokens.<n>Experiments on DiT, FLUX and HunyuanVideo demonstrate its effectiveness in both text-to-image and text-to-video generation.
arXiv Detail & Related papers (2025-09-12T14:53:45Z)
Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation [50.04968365065964]
Diffusion-based talking head models generate high-quality, photorealistic videos but suffer from slow inference.<n>We introduce Lightning-fast Caching-based Parallel denoising prediction (LightningCP)<n>We also propose Decoupled Foreground Attention (DFA) to further accelerate attention computations.
arXiv Detail & Related papers (2025-08-25T02:58:39Z)
DiCache: Let Diffusion Model Determine Its Own Cache [62.954717254728166]
DiCache is a training-free adaptive caching strategy for accelerating diffusion models at runtime.<n>Online Probe Profiling Scheme leverages a shallow-layer online probe to obtain an on-the-fly indicator for the caching error in real time.<n> Dynamic Cache Trajectory Alignment approximates the deep-layer feature output from multi-step historical caches.
arXiv Detail & Related papers (2025-08-24T13:30:00Z)
Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z)
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models [41.11005178050448]
ProfilingDiT is a novel adaptive caching strategy that explicitly disentangles foreground and background-focused blocks.<n>Our framework achieves significant acceleration while maintaining visual fidelity across comprehensive quality metrics.
arXiv Detail & Related papers (2025-04-04T03:30:15Z)
FEB-Cache: Frequency-Guided Exposure Bias Reduction for Enhancing Diffusion Transformer Caching [10.760030872557374]
Diffusion Transformer (DiT) has exhibited impressive generation capabilities but faces great challenges due to its high computational complexity.<n>In this paper, we first confirm that the cache greatly amplifies the exposure bias, resulting in a decline in the generation quality.<n>We introduce FEB-Cache, a joint caching strategy that aligns with the non-exposed bias diffusion process.
arXiv Detail & Related papers (2025-03-10T09:49:18Z)
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters. We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z)
Cache Me if You Can: Accelerating Diffusion Models through Block Caching [67.54820800003375]
A large image-to-image network has to be applied many times to iteratively refine an image from random noise. We investigate the behavior of the layers within the network and find that 1) the layers' output changes smoothly over time, 2) the layers show distinct patterns of change, and 3) the change from step to step is often very small. We propose a technique to automatically determine caching schedules based on each block's changes over timesteps.
arXiv Detail & Related papers (2023-12-06T00:51:38Z)
Frame Flexible Network [52.623337134518835]
Existing video recognition algorithms always conduct different training pipelines for inputs with different frame numbers. If we evaluate the model using other frames which are not used in training, we observe the performance will drop significantly. We propose a general framework, named Frame Flexible Network (FFN), which enables the model to be evaluated at different frames to adjust its computation.
arXiv Detail & Related papers (2023-03-26T20:51:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.