Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration
- URL: http://arxiv.org/abs/2603.01623v1
- Date: Mon, 02 Mar 2026 08:59:11 GMT
- Title: Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration
- Authors: Jiaqi Han, Juntong Shi, Puheng Li, Haotian Ye, Qiushan Guo, Stefano Ermon,
- Abstract summary: We propose spectral diffusion feature forecaster (Spectrum) to enable global, long-range feature reuse with tightly controlled error.<n>We achieve up to 4.79$times$ speedup on FLUX.1 and 4.67$times$ speedup on Wan2.1-14B, while maintaining much higher sample quality compared with the baselines.
- Score: 58.19554276924402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have become the dominant tool for high-fidelity image and video generation, yet are critically bottlenecked by their inference speed due to the numerous iterative passes of Diffusion Transformers. To reduce the exhaustive compute, recent works resort to the feature caching and reusing scheme that skips network evaluations at selected diffusion steps by using cached features in previous steps. However, their preliminary design solely relies on local approximation, causing errors to grow rapidly with large skips and leading to degraded sample quality at high speedups. In this work, we propose spectral diffusion feature forecaster (Spectrum), a training-free approach that enables global, long-range feature reuse with tightly controlled error. In particular, we view the latent features of the denoiser as functions over time and approximate them with Chebyshev polynomials. Specifically, we fit the coefficient for each basis via ridge regression, which is then leveraged to forecast features at multiple future diffusion steps. We theoretically reveal that our approach admits more favorable long-horizon behavior and yields an error bound that does not compound with the step size. Extensive experiments on various state-of-the-art image and video diffusion models consistently verify the superiority of our approach. Notably, we achieve up to 4.79$\times$ speedup on FLUX.1 and 4.67$\times$ speedup on Wan2.1-14B, while maintaining much higher sample quality compared with the baselines.
Related papers
- Predict to Skip: Linear Multistep Feature Forecasting for Efficient Diffusion Transformers [10.751183015853863]
Diffusion Transformers (DiT) have emerged as a widely adopted backbone for high-fidelity image and video generation.<n>We propose textbfPrediT, a training-free acceleration framework that formulates feature prediction as a linear multistep problem.<n>Our method achieves up to $5.54times$ latency reduction across various DiT-based image and video generation models, while incurring negligible quality degradation.
arXiv Detail & Related papers (2026-02-20T09:33:59Z) - Forecast the Principal, Stabilize the Residual: Subspace-Aware Feature Caching for Efficient Diffusion Transformers [9.698781486878206]
Diffusion Transformer (DiT) models have achieved unprecedented quality in image and video generation, yet their iterative sampling process remains computationally prohibitive.<n>We propose SVD-Cache, a subspace-aware caching framework that decomposes diffusion features via Singular Value Decomposition (SVD)<n>Our code is in supplementary material and will be released on Github.
arXiv Detail & Related papers (2026-01-12T10:30:12Z) - Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers [19.107716099809707]
Diffusion Transformers (DiTs) have demonstrated exceptional performance in high-fidelity image and video generation.<n>Current methods often struggle to maintain generation quality at high acceleration ratios.<n>We propose FoCa, which treats feature caching as a feature-ODE solving problem.
arXiv Detail & Related papers (2025-08-22T08:34:03Z) - CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers [72.23291099555459]
Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures.<n>This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism.<n>ChoRDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation.
arXiv Detail & Related papers (2025-07-21T05:48:47Z) - TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models [49.65286242048452]
We propose a novel method dubbed Timestep-Channel Adaptive Quantization for Diffusion Models (TCAQ-DM)<n>The proposed method substantially outperforms the state-of-the-art approaches in most cases.
arXiv Detail & Related papers (2024-12-21T16:57:54Z) - Constrained Diffusion with Trust Sampling [11.354281911272864]
We rethink training-free loss-guided diffusion from an optimization perspective.
Trust sampling effectively balances following the unconditional diffusion model and adhering to the loss guidance.
We demonstrate the efficacy of our method through extensive experiments on complex tasks, and in drastically different domains of images and 3D motion generation.
arXiv Detail & Related papers (2024-11-17T01:34:57Z) - Solving Video Inverse Problems Using Image Diffusion Models [58.464465016269614]
We introduce an innovative video inverse solver that leverages only image diffusion models.<n>Our method treats the time dimension of a video as the batch dimension image diffusion models.<n>We also introduce a batch-consistent sampling strategy that encourages consistency across batches.
arXiv Detail & Related papers (2024-09-04T09:48:27Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner [112.99126045081046]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.<n>We propose a textbftimestep tuner that helps find a more accurate integral direction for a particular interval at the minimum cost.<n>Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z) - Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks.
We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture.
We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z) - Wavelet Diffusion Models are fast and scalable Image Generators [3.222802562733787]
Diffusion models are a powerful solution for high-fidelity image generation, which exceeds GANs in quality in many circumstances.
Recent DiffusionGAN method significantly decreases the models' running time by reducing the number of sampling steps from thousands to several, but their speeds still largely lag behind the GAN counterparts.
This paper aims to reduce the speed gap by proposing a novel wavelet-based diffusion scheme.
We extract low-and-high frequency components from both image and feature levels via wavelet decomposition and adaptively handle these components for faster processing while maintaining good generation quality.
arXiv Detail & Related papers (2022-11-29T12:25:25Z) - Improving Diffusion Models for Inverse Problems using Manifold Constraints [55.91148172752894]
We show that current solvers throw the sample path off the data manifold, and hence the error accumulates.
To address this, we propose an additional correction term inspired by the manifold constraint.
We show that our method is superior to the previous methods both theoretically and empirically.
arXiv Detail & Related papers (2022-06-02T09:06:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.