Related papers: Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition

Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition

URL: http://arxiv.org/abs/2505.05829v1
Date: Fri, 09 May 2025 06:56:17 GMT
Title: Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Authors: Zhiyuan Chen, Keyi Li, Yifan Jia, Le Ye, Yufei Ma,
Abstract summary: Diffusion transformer (DiT) models have achieved remarkable success in image generation.<n>We propose increment-calibrated caching, a training-free method for DiT acceleration.<n>Our method eliminates more than 45% and improves IS by 12 at the cost of less than 0.06 FID increase.
Score: 4.0594792247165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion transformer (DiT) models have achieved remarkable success in image generation, thanks for their exceptional generative capabilities and scalability. Nonetheless, the iterative nature of diffusion models (DMs) results in high computation complexity, posing challenges for deployment. Although existing cache-based acceleration methods try to utilize the inherent temporal similarity to skip redundant computations of DiT, the lack of correction may induce potential quality degradation. In this paper, we propose increment-calibrated caching, a training-free method for DiT acceleration, where the calibration parameters are generated from the pre-trained model itself with low-rank approximation. To deal with the possible correction failure arising from outlier activations, we introduce channel-aware Singular Value Decomposition (SVD), which further strengthens the calibration effect. Experimental results show that our method always achieve better performance than existing naive caching methods with a similar computation resource budget. When compared with 35-step DDIM, our method eliminates more than 45% computation and improves IS by 12 at the cost of less than 0.06 FID increase. Code is available at https://github.com/ccccczzy/icc.

Related papers

DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization [29.066284789131494]
Recent post-training quantization methods overlook outliers, leading to degraded performance at low bit-widths.<n>We propose a DMQ which combines Learned Equivalent Scaling and channel-wise Power-of-Two Scaling.<n>Our method significantly outperforms existing works, especially at low bit-widths.
arXiv Detail & Related papers (2025-07-17T09:15:29Z)
Flow-GRPO: Training Flow Matching Models via Online RL [75.70017261794422]
We propose Flow-GRPO, the first method integrating online reinforcement learning (RL) into flow matching models.<n>Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Equation (ODE) into an equivalent Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps; and (2) a Denoising Reduction strategy that reduces training denoising steps while retaining the original inference timestep number.
arXiv Detail & Related papers (2025-05-08T17:58:45Z)
One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
Accelerating Diffusion Transformer via Gradient-Optimized Cache [18.32157920050325]
Progressive error accumulation from cached blocks significantly degrades generation quality.<n>Current error compensation approaches neglect dynamic patterns during the caching process, leading to suboptimal error correction.<n>We propose the Gradient-lectiond Cache (GOC) with two key innovations.<n>GOC achieves IS 216.28 (26.3% higher) and FID 3.907 (43% lower) compared to baseline DiT, while maintaining identical computational costs.
arXiv Detail & Related papers (2025-03-07T05:31:47Z)
Q&C: When Quantization Meets Cache in Efficient Image Generation [24.783679431414686]
We find that the combination of quantization and cache mechanisms for Diffusion Transformers (DiTs) is not straightforward.<n>We propose a hybrid acceleration method by tackling the above challenges.<n>Our method has accelerated DiTs by 12.7x while preserving competitive generation capability.
arXiv Detail & Related papers (2025-03-04T11:19:02Z)
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers [79.07412045476872]
Diffusion Transformers have emerged as the preeminent models for a wide array of generative tasks.<n>We show that performing the full of the model at each diffusion step is unnecessary, as some computations can be skipped by lazily reusing the results of previous steps.<n>We propose a lazy learning framework that efficiently leverages cached results from earlier steps to skip redundant computations.
arXiv Detail & Related papers (2024-12-17T01:12:35Z)
DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization [22.546989373687655]
We propose a novel pruning method that derives an efficient diffusion model via a more intelligent and differentiable pruner. Our approach achieves 4.4 x speedup for SD-1.5 without any loss of accuracy, significantly outperforming the previous state-of-the-art methods.
arXiv Detail & Related papers (2024-10-22T12:18:24Z)
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters. We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z)
The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling [78.6155095947769]
Skip-Tuning is a simple yet surprisingly effective training-free tuning method on the skip connections. Our method can achieve 100% FID improvement for pretrained EDM on ImageNet 64 with only 19 NFEs (1.75) While Skip-Tuning increases the score-matching losses in the pixel space, the losses in the feature space are reduced.
arXiv Detail & Related papers (2024-02-23T08:05:23Z)
An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems. We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.