Related papers: Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor

Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor

URL: http://arxiv.org/abs/2508.02240v2
Date: Tue, 05 Aug 2025 02:13:39 GMT
Title: Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor
Authors: Xiaoliu Guan, Lielin Jiang, Hanqi Chen, Xu Zhang, Jiaxing Yan, Guanzhong Wang, Yi Liu, Zetao Zhang, Yu Wu,
Abstract summary: Diffusion Transformers (DiTs) have demonstrated remarkable performance in visual generation tasks.<n>Recent training-free approaches exploit the redundancy of features across timesteps by caching and reusing past representations to accelerate inference.<n>TaylorSeer instead uses cached features to predict future ones via Taylor expansion.<n>We propose a novel approach to better leverage Taylor-based acceleration.
Score: 10.899451333703437
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Transformers (DiTs) have demonstrated remarkable performance in visual generation tasks. However, their low inference speed limits their deployment in low-resource applications. Recent training-free approaches exploit the redundancy of features across timesteps by caching and reusing past representations to accelerate inference. Building on this idea, TaylorSeer instead uses cached features to predict future ones via Taylor expansion. However, its module-level prediction across all transformer blocks (e.g., attention or feedforward modules) requires storing fine-grained intermediate features, leading to notable memory and computation overhead. Moreover, it adopts a fixed caching schedule without considering the varying accuracy of predictions across timesteps, which can lead to degraded outputs when prediction fails. To address these limitations, we propose a novel approach to better leverage Taylor-based acceleration. First, we shift the Taylor prediction target from the module level to the last block level, significantly reducing the number of cached features. Furthermore, observing strong sequential dependencies among Transformer blocks, we propose to use the error between the Taylor-estimated and actual outputs of the first block as an indicator of prediction reliability. If the error is small, we trust the Taylor prediction for the last block; otherwise, we fall back to full computation, thereby enabling a dynamic caching mechanism. Empirical results show that our method achieves a better balance between speed and quality, achieving a 3.17x acceleration on FLUX, 2.36x on DiT, and 4.14x on Wan Video with negligible quality drop. The Project Page is \href{https://cg-taylor-acce.github.io/CG-Taylor/}{here.}

Related papers

Temporal Difference Flows [82.24174052059352]
Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states.<n>Existing methods are negatively affected by bootstrapping predictions at train time and struggle to generate high-quality predictions at long horizons.<n>This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods.
arXiv Detail & Related papers (2025-03-12T20:30:07Z)
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers [14.402483491830138]
Diffusion Transformers (DiT) have revolutionized high-fidelity image and video synthesis, yet their computational demands remain prohibitive for real-time applications.<n> feature caching has been proposed to accelerate diffusion models by caching the features in the previous timesteps and then reusing them in the following timesteps.<n>We propose TaylorSeer, which firstly shows that features of diffusion models at future timesteps can be predicted based on their values at previous timesteps.
arXiv Detail & Related papers (2025-03-10T05:09:42Z)
Video Prediction Transformers without Recurrence or Convolution [65.93130697098658]
We propose PredFormer, a framework entirely based on Gated Transformers.<n>We provide a comprehensive analysis of 3D Attention in the context of video prediction.<n>The significant improvements in both accuracy and efficiency highlight the potential of PredFormer.
arXiv Detail & Related papers (2024-10-07T03:52:06Z)
Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning [33.28797183140384]
Temporal difference (TD) learning uses bootstrapping to overcome variance but introduces a bias that can only be corrected through many iterations. We propose Chunked-TD, which uses predicted probabilities of transitions from a model for computing $lambda$-return targets.
arXiv Detail & Related papers (2024-05-06T21:49:29Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z)
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting. First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals. Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions. Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z)
Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep. We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z)
Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception. We build a simple and effective framework for streaming perception. Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z)
Taylor saves for later: disentanglement for video prediction using Taylor representation [5.658571172210811]
We propose a two-branch seq-to-seq deep model to disentangle the Taylor feature and the residual feature in video frames. TaylorCell can expand the video frames' high-dimensional features into the finite Taylor series to describe the latent laws. MCU distills all past frames' information to correct the predicted Taylor feature from TPU.
arXiv Detail & Related papers (2021-05-24T01:59:21Z)
Learnable and Instance-Robust Predictions for Online Matching, Flows and Load Balancing [12.961453245099044]
We propose a new model for augmenting algorithms with predictions by requiring that they are formally learnable and instance robust. We design online algorithms with predictions for a network flow allocation problem and restricted assignment makespan minimization.
arXiv Detail & Related papers (2020-11-23T21:38:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.