Denoising as Path Planning: Training-Free Acceleration of Diffusion Models with DPCache
- URL: http://arxiv.org/abs/2602.22654v1
- Date: Thu, 26 Feb 2026 06:13:33 GMT
- Title: Denoising as Path Planning: Training-Free Acceleration of Diffusion Models with DPCache
- Authors: Bowen Cui, Yuanbin Wang, Huajiang Xu, Biaolong Chen, Aixi Zhang, Hao Jiang, Zhengzheng Jin, Xu Liu, Pipei Huang,
- Abstract summary: We propose DPCache, a training-free acceleration framework that formulates diffusion acceleration as a global path planning problem.<n> DPCache employs dynamic programming to select an optimal sequence of key timesteps that minimizes the total path cost while preserving trajectory fidelity.<n>Experiments on DiT, FLUX, and HunyuanVideo demonstrate that DPCache achieves strong acceleration with minimal quality loss.
- Score: 8.614492355393578
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have demonstrated remarkable success in image and video generation, yet their practical deployment remains hindered by the substantial computational overhead of multi-step iterative sampling. Among acceleration strategies, caching-based methods offer a training-free and effective solution by reusing or predicting features across timesteps. However, existing approaches rely on fixed or locally adaptive schedules without considering the global structure of the denoising trajectory, often leading to error accumulation and visual artifacts. To overcome this limitation, we propose DPCache, a novel training-free acceleration framework that formulates diffusion sampling acceleration as a global path planning problem. DPCache constructs a Path-Aware Cost Tensor from a small calibration set to quantify the path-dependent error of skipping timesteps conditioned on the preceding key timestep. Leveraging this tensor, DPCache employs dynamic programming to select an optimal sequence of key timesteps that minimizes the total path cost while preserving trajectory fidelity. During inference, the model performs full computations only at these key timesteps, while intermediate outputs are efficiently predicted using cached features. Extensive experiments on DiT, FLUX, and HunyuanVideo demonstrate that DPCache achieves strong acceleration with minimal quality loss, outperforming prior acceleration methods by $+$0.031 ImageReward at 4.87$\times$ speedup and even surpassing the full-step baseline by $+$0.028 ImageReward at 3.54$\times$ speedup on FLUX, validating the effectiveness of our path-aware global scheduling framework. Code will be released at https://github.com/argsss/DPCache.
Related papers
- SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching [75.02865981328509]
Caching reduces computation by reusing previously computed model outputs across timesteps.<n>We propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis.<n>SenCache achieves better visual quality than existing caching methods under similar computational budgets.
arXiv Detail & Related papers (2026-02-27T17:36:09Z) - MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference [11.934900617930774]
MeanCache is a training-free caching framework for efficient Flow Matching inference.<n>We demonstrate that MeanCache achieves 4.12X and 4.56X and 3.59X acceleration, respectively, while consistently outperforming state-of-the-art caching baselines in generation quality.
arXiv Detail & Related papers (2026-01-27T08:35:50Z) - ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion [30.897215456167753]
Diffusion models suffer from substantial computational overhead due to their inherently iterative inference process.<n>We propose ERTACache, a principled caching framework that jointly rectifies both error types.<n>ERTACache achieves up to 2x inference speedup while consistently preserving or even improving visual quality.
arXiv Detail & Related papers (2025-08-27T10:37:24Z) - DiCache: Let Diffusion Model Determine Its Own Cache [62.954717254728166]
DiCache is a training-free adaptive caching strategy for accelerating diffusion models at runtime.<n>Online Probe Profiling Scheme leverages a shallow-layer online probe to obtain an on-the-fly indicator for the caching error in real time.<n> Dynamic Cache Trajectory Alignment approximates the deep-layer feature output from multi-step historical caches.
arXiv Detail & Related papers (2025-08-24T13:30:00Z) - TaoCache: Structure-Maintained Video Generation Acceleration [4.594224594572109]
We present TaoCache, a training-free, plug-and-play caching strategy for video diffusion models.<n>It adopts a fixed-point perspective to predict the model's noise output and is specifically effective in late denoising stages.
arXiv Detail & Related papers (2025-08-12T14:40:36Z) - Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching [57.7533917467934]
EasyCache is a training-free acceleration framework for video diffusion models.<n>We conduct comprehensive studies on various large-scale video generation models, including OpenSora, Wan2.1, and HunyuanVideo.<n>Our method achieves leading acceleration performance, reducing inference time by up to 2.1-3.3$times$ compared to the original baselines.
arXiv Detail & Related papers (2025-07-03T17:59:54Z) - MagCache: Fast Video Generation with Magnitude-Aware Cache [91.2771453279713]
We introduce a novel and robust discovery: a unified magnitude law observed across different models and prompts.<n>We introduce a Magnitude-aware Cache (MagCache) that adaptively skips unimportant timesteps using an error modeling mechanism and adaptive caching strategy.<n> Experimental results show that MagCache achieves 2.10x-2.68x speedups on Open-Sora, CogVideoX, Wan 2.1, and HunyuanVideo, while preserving superior visual fidelity.
arXiv Detail & Related papers (2025-06-10T17:59:02Z) - CacheQuant: Comprehensively Accelerated Diffusion Models [3.78219736760145]
CacheQuant is a novel training-free paradigm that comprehensively accelerates diffusion models by jointly optimizing model caching and quantization techniques.<n> Experimental results show that CacheQuant achieves a 5.18 speedup and 4 compression for Stable Diffusion on MS-COCO, with only a 0.02 loss in CLIP score.
arXiv Detail & Related papers (2025-03-03T09:04:51Z) - Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model [55.64316746098431]
Timestep Embedding Aware Cache (TeaCache) is a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps.<n>TeaCache achieves up to 4.41x acceleration over Open-Sora-Plan with negligible degradation of visual quality.
arXiv Detail & Related papers (2024-11-28T12:50:05Z) - DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture.
DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.