Related papers: FEB-Cache: Frequency-Guided Exposure Bias Reduction for Enhancing Diffusion Transformer Caching

FEB-Cache: Frequency-Guided Exposure Bias Reduction for Enhancing Diffusion Transformer Caching

URL: http://arxiv.org/abs/2503.07120v2
Date: Tue, 05 Aug 2025 16:17:01 GMT
Title: FEB-Cache: Frequency-Guided Exposure Bias Reduction for Enhancing Diffusion Transformer Caching
Authors: Zhen Zou, Feng Zhao,
Abstract summary: Diffusion Transformer (DiT) has exhibited impressive generation capabilities but faces great challenges due to its high computational complexity.<n>In this paper, we first confirm that the cache greatly amplifies the exposure bias, resulting in a decline in the generation quality.<n>We introduce FEB-Cache, a joint caching strategy that aligns with the non-exposed bias diffusion process.
Score: 4.8677910801584385
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion Transformer (DiT) has exhibited impressive generation capabilities but faces great challenges due to its high computational complexity. To address this issue, various methods, notably feature caching, have been introduced. However, these approaches focus on aligning non-cache diffusion without analyzing why caching damage the generation processes. In this paper, we first confirm that the cache greatly amplifies the exposure bias, resulting in a decline in the generation quality. However, directly applying noise scaling is challenging for this issue due to the non-smoothness of exposure bias. We found that this phenomenon stems from the mismatch between its frequency response characteristics and the simple cache of Attention and MLP. Since these two components exhibit unique preferences for frequency signals, which provides us with a caching strategy to separate Attention and MLP to achieve an enhanced fit of exposure bias and reduce it. Based on this, we introduced FEB-Cache, a joint caching strategy that aligns with the non-exposed bias diffusion process (which gives us a higher performance cap) of caching Attention and MLP based on the frequency-guided cache table. Our approach combines a comprehensive understanding of the caching mechanism and offers a new perspective on leveraging caching to accelerate the diffusion process. Empirical results indicate that FEB-Cache optimizes model performance while concurrently facilitating acceleration.

Related papers

PromptTea: Let Prompts Tell TeaCache the Optimal Threshold [1.0665410339553834]
A common acceleration strategy involves reusing model outputs via caching mechanisms at fixed intervals.<n>We propose Prompt-Complexity-Aware (PCA) caching, a method that automatically adjusts reuse thresholds based on scene complexity estimated directly from the input prompt.
arXiv Detail & Related papers (2025-07-09T10:53:05Z)
MagCache: Fast Video Generation with Magnitude-Aware Cache [91.51242917160373]
We introduce a novel and robust discovery: a unified magnitude law observed across different models and prompts.<n>We introduce a Magnitude-aware Cache (MagCache) that adaptively skips unimportant timesteps using an error modeling mechanism and adaptive caching strategy.<n> Experimental results show that MagCache achieves 2.1x and 2.68x speedups on Open-Sora and Wan 2.1, respectively.
arXiv Detail & Related papers (2025-06-10T17:59:02Z)
AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse [19.13826316844611]
Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference. We provide a theoretical understanding by analyzing the denoising process through the second-order Adams-Bashforth method. We propose a novel caching-based acceleration approach for diffusion models, instead of directly reusing cached results.
arXiv Detail & Related papers (2025-04-13T08:29:58Z)
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models [41.11005178050448]
ProfilingDiT is a novel adaptive caching strategy that explicitly disentangles foreground and background-focused blocks. Our framework achieves significant acceleration while maintaining visual fidelity across comprehensive quality metrics.
arXiv Detail & Related papers (2025-04-04T03:30:15Z)
QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [84.91431271257437]
Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation.<n>DiTs come with significant drawbacks, including increased computational and memory costs.<n>We propose QuantCache, a novel training-free inference acceleration framework.
arXiv Detail & Related papers (2025-03-09T10:31:51Z)
One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step. To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration. Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
Q&C: When Quantization Meets Cache in Efficient Image Generation [24.783679431414686]
We find that the combination of quantization and cache mechanisms for Diffusion Transformers (DiTs) is not straightforward.<n>We propose a hybrid acceleration method by tackling the above challenges.<n>Our method has accelerated DiTs by 12.7x while preserving competitive generation capability.
arXiv Detail & Related papers (2025-03-04T11:19:02Z)
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model [55.64316746098431]
Timestep Embedding Aware Cache (TeaCache) is a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps.<n>TeaCache achieves up to 4.41x acceleration over Open-Sora-Plan with negligible degradation of visual quality.
arXiv Detail & Related papers (2024-11-28T12:50:05Z)
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers [4.7170474122879575]
Diffusion Transformers (DiT) have emerged as powerful generative models for various tasks, including image, video, and speech synthesis.<n>We introduce SmoothCache, a model-agnostic inference acceleration technique for DiT architectures.<n>Our experiments demonstrate that SmoothCache achieves 71% 8% to speed up while maintaining or even improving generation quality across diverse modalities.
arXiv Detail & Related papers (2024-11-15T16:24:02Z)
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality [58.80996741843102]
FasterCache is a training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation. We show that FasterCache can significantly accelerate video generation while keeping video quality comparable to the baseline.
arXiv Detail & Related papers (2024-10-25T07:24:38Z)
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference [41.41316718220569]
ExpertFlow is designed to enhance inference efficiency by accommodating flexible routing and enabling efficient expert scheduling between CPU and GPU. Our experiments demonstrate that ExpertFlow achieves up to 93.72% GPU memory savings and enhances inference speed by 2 to 10 times compared to baseline methods.
arXiv Detail & Related papers (2024-10-23T15:24:54Z)
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters. We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z)
DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture. DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models. Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z)
Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching. While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error. We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.