Related papers: Forecast the Principal, Stabilize the Residual: Subspace-Aware Feature Caching for Efficient Diffusion Transformers

Forecast the Principal, Stabilize the Residual: Subspace-Aware Feature Caching for Efficient Diffusion Transformers

URL: http://arxiv.org/abs/2601.07396v1
Date: Mon, 12 Jan 2026 10:30:12 GMT
Title: Forecast the Principal, Stabilize the Residual: Subspace-Aware Feature Caching for Efficient Diffusion Transformers
Authors: Guantao Chen, Shikang Zheng, Yuqi Lin, Linfeng Zhang,
Abstract summary: Diffusion Transformer (DiT) models have achieved unprecedented quality in image and video generation, yet their iterative sampling process remains computationally prohibitive.<n>We propose SVD-Cache, a subspace-aware caching framework that decomposes diffusion features via Singular Value Decomposition (SVD)<n>Our code is in supplementary material and will be released on Github.
Score: 9.698781486878206
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Diffusion Transformer (DiT) models have achieved unprecedented quality in image and video generation, yet their iterative sampling process remains computationally prohibitive. To accelerate inference, feature caching methods have emerged by reusing intermediate representations across timesteps. However, existing caching approaches treat all feature components uniformly. We reveal that DiT feature spaces contain distinct principal and residual subspaces with divergent temporal behavior: the principal subspace evolves smoothly and predictably, while the residual subspace exhibits volatile, low-energy oscillations that resist accurate prediction. Building on this insight, we propose SVD-Cache, a subspace-aware caching framework that decomposes diffusion features via Singular Value Decomposition (SVD), applies exponential moving average (EMA) prediction to the dominant low-rank components, and directly reuses the residual subspace. Extensive experiments demonstrate that SVD-Cache achieves near-lossless across diverse models and methods, including 5.55$\times$ speedup on FLUX and HunyuanVideo, and compatibility with model acceleration techniques including distillation, quantization and sparse attention. Our code is in supplementary material and will be released on Github.

Related papers

Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration [58.19554276924402]
We propose spectral diffusion feature forecaster (Spectrum) to enable global, long-range feature reuse with tightly controlled error.<n>We achieve up to 4.79$times$ speedup on FLUX.1 and 4.67$times$ speedup on Wan2.1-14B, while maintaining much higher sample quality compared with the baselines.
arXiv Detail & Related papers (2026-03-02T08:59:11Z)
SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models [41.7269767513774]
We introduce Spectral-Evolution-Aware Cache (SeaCache), a training-free cache schedule that reuses decisions on a spectrally aligned representation.<n>Through theoretical and empirical analysis, we derive a Spectral-Evolution-Aware filter that preserves content-relevant components while suppressing noise.<n>Experiments on diverse visual generative models and the baselines show that SeaCache achieves state-of-the-art latency-quality trade-offs.
arXiv Detail & Related papers (2026-02-22T00:48:03Z)
ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration [14.306565517230775]
Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment.<n>Existing methods suffer from two key limitations: (1) uniform caching intervals fail to align with the non-uniform temporal dynamics of DiT, and (2) naive feature reuse with excessively large caching intervals can lead to severe error accumulation.<n>We propose ProCache, a training-free dynamic feature caching framework that addresses these issues via two core components.
arXiv Detail & Related papers (2025-12-19T07:27:19Z)
HiCache: Training-free Acceleration of Diffusion Models via Hermite Polynomial-based Feature Caching [19.107716099809707]
HiCache is a training-free acceleration framework that improves feature prediction.<n>We introduce a dual-scaling mechanism that ensures numerical stability while preserving predictive accuracy.
arXiv Detail & Related papers (2025-08-23T10:35:16Z)
FLEX: A Backbone for Diffusion-Based Modeling of Spatio-temporal Physical Systems [51.15230303652732]
FLEX (F Low EXpert) is a backbone architecture for generative modeling of-temporal physical systems.<n>It reduces the variance of the velocity field in the diffusion model, which helps stabilize training.<n>It achieves accurate predictions for super-resolution and forecasting tasks using as few features as two reverse diffusion steps.
arXiv Detail & Related papers (2025-05-23T00:07:59Z)
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models [41.11005178050448]
ProfilingDiT is a novel adaptive caching strategy that explicitly disentangles foreground and background-focused blocks.<n>Our framework achieves significant acceleration while maintaining visual fidelity across comprehensive quality metrics.
arXiv Detail & Related papers (2025-04-04T03:30:15Z)
Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints [51.83081671798784]
Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability.<n>DiT's practical application suffers from inherent dynamic feature instability, leading to error amplification during cached inference.<n>We propose Skip-DiT, an image and video generative DiT variant enhanced with Long-Skip-Connections (LSCs) - the key efficiency component in U-Nets.
arXiv Detail & Related papers (2024-11-26T17:28:10Z)
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters. We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z)
Collaborative Feedback Discriminative Propagation for Video Super-Resolution [66.61201445650323]
Key success of video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information. Inaccurate alignment usually leads to aligned features with significant artifacts. propagation modules only propagate the same timestep features forward or backward.
arXiv Detail & Related papers (2024-04-06T22:08:20Z)
DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space. We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process. Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z)
Spatial-Temporal Transformer based Video Compression Framework [44.723459144708286]
We propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework. It contains a Relaxed Deformable Transformer (RDT) with Uformer based offsets estimation for motion estimation and compensation, a Multi-Granularity Prediction (MGP) module based on multi-reference frames for prediction refinement, and a Spatial Feature Distribution prior based Transformer (SFD-T) for efficient temporal-spatial joint residual compression. Experimental results demonstrate that our method achieves the best result with 13.5% BD-Rate saving over VTM.
arXiv Detail & Related papers (2023-09-21T09:23:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.