Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
- URL: http://arxiv.org/abs/2504.03140v1
- Date: Fri, 04 Apr 2025 03:30:15 GMT
- Title: Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
- Authors: Xuran Ma, Yexin Liu, Yaofu Liu, Xianfeng Wu, Mingzhe Zheng, Zihao Wang, Ser-Nam Lim, Harry Yang,
- Abstract summary: ProfilingDiT is a novel adaptive caching strategy that explicitly disentangles foreground and background-focused blocks.<n>Our framework achieves significant acceleration while maintaining visual fidelity across comprehensive quality metrics.
- Score: 41.11005178050448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in diffusion models have demonstrated remarkable capabilities in video generation. However, the computational intensity remains a significant challenge for practical applications. While feature caching has been proposed to reduce the computational burden of diffusion models, existing methods typically overlook the heterogeneous significance of individual blocks, resulting in suboptimal reuse and degraded output quality. To this end, we address this gap by introducing ProfilingDiT, a novel adaptive caching strategy that explicitly disentangles foreground and background-focused blocks. Through a systematic analysis of attention distributions in diffusion models, we reveal a key observation: 1) Most layers exhibit a consistent preference for either foreground or background regions. 2) Predicted noise shows low inter-step similarity initially, which stabilizes as denoising progresses. This finding inspires us to formulate a selective caching strategy that preserves full computation for dynamic foreground elements while efficiently caching static background features. Our approach substantially reduces computational overhead while preserving visual fidelity. Extensive experiments demonstrate that our framework achieves significant acceleration (e.g., 2.01 times speedup for Wan2.1) while maintaining visual fidelity across comprehensive quality metrics, establishing a viable method for efficient video generation.
Related papers
- AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse [19.13826316844611]
Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference.
We provide a theoretical understanding by analyzing the denoising process through the second-order Adams-Bashforth method.
We propose a novel caching-based acceleration approach for diffusion models, instead of directly reusing cached results.
arXiv Detail & Related papers (2025-04-13T08:29:58Z) - Exposure Bias Reduction for Enhancing Diffusion Transformer Feature Caching [7.393824353099595]
Diffusion Transformer (DiT) has exhibited impressive generation capabilities but faces great challenges due to its high computational complexity.<n>We analyze the impact of caching on the SNR of the diffusion process.<n>We introduce EB-Cache, a joint cache strategy that aligns the Non-exposure bias.
arXiv Detail & Related papers (2025-03-10T09:49:18Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Towards Scalable and Deep Graph Neural Networks via Noise Masking [59.058558158296265]
Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks.
scaling them to large graphs is challenging due to the high computational and storage costs.
We present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works.
arXiv Detail & Related papers (2024-12-19T07:48:14Z) - Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model [55.64316746098431]
Timestep Embedding Aware Cache (TeaCache) is a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps.
TeaCache achieves up to 4.41x acceleration over Open-Sora-Plan with negligible degradation of visual quality.
arXiv Detail & Related papers (2024-11-28T12:50:05Z) - FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality [58.80996741843102]
FasterCache is a training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation.<n>We show that FasterCache can significantly accelerate video generation while keeping video quality comparable to the baseline.
arXiv Detail & Related papers (2024-10-25T07:24:38Z) - Edge-preserving noise for diffusion models [4.435514696080208]
We present a novel edge-preserving diffusion model that is a generalization of denoising diffusion probablistic models (DDPM)
In particular, we introduce an edge-aware noise scheduler that varies between edge-preserving and isotropic Gaussian noise.
We show that our model's generative process converges faster to results that more closely match the target distribution.
arXiv Detail & Related papers (2024-10-02T13:29:52Z) - DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture.
DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.