InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models
- URL: http://arxiv.org/abs/2512.05134v1
- Date: Sat, 29 Nov 2025 02:34:23 GMT
- Title: InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models
- Authors: Zihao Wu,
- Abstract summary: InvarDiff is a training-free acceleration method that exploits the relative temporal invariance across timestep-scale and layer-scale.<n> Experiments show that InvarDiff achieves $2$-$3times$ end-to-end speed-ups with minimal impact on standard quality metrics.
- Score: 2.6735992385049663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models deliver high-fidelity synthesis but remain slow due to iterative sampling. We empirically observe there exists feature invariance in deterministic sampling, and present InvarDiff, a training-free acceleration method that exploits the relative temporal invariance across timestep-scale and layer-scale. From a few deterministic runs, we compute a per-timestep, per-layer, per-module binary cache plan matrix and use a re-sampling correction to avoid drift when consecutive caches occur. Using quantile-based change metrics, this matrix specifies which module at which step is reused rather than recomputed. The same invariance criterion is applied at the step scale to enable cross-timestep caching, deciding whether an entire step can reuse cached results. During inference, InvarDiff performs step-first and layer-wise caching guided by this matrix. When applied to DiT and FLUX, our approach reduces redundant compute while preserving fidelity. Experiments show that InvarDiff achieves $2$-$3\times$ end-to-end speed-ups with minimal impact on standard quality metrics. Qualitatively, we observe almost no degradation in visual quality compared with full computations.
Related papers
- SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching [75.02865981328509]
Caching reduces computation by reusing previously computed model outputs across timesteps.<n>We propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis.<n>SenCache achieves better visual quality than existing caching methods under similar computational budgets.
arXiv Detail & Related papers (2026-02-27T17:36:09Z) - Multiscale replay: A robust algorithm for stochastic variational inequalities with a Markovian buffer [10.836971562948042]
We introduce the Multiscale Experience Replay (MER) algorithm for solving a class of variational inequalities (VIs)<n>Rather than uniformly sampling from the buffer, MER utilizes a multi-scale sampling scheme to emulate the behavior of VI algorithms designed for independent and identically distributed samples.
arXiv Detail & Related papers (2026-01-04T12:05:48Z) - ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration [14.306565517230775]
Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment.<n>Existing methods suffer from two key limitations: (1) uniform caching intervals fail to align with the non-uniform temporal dynamics of DiT, and (2) naive feature reuse with excessively large caching intervals can lead to severe error accumulation.<n>We propose ProCache, a training-free dynamic feature caching framework that addresses these issues via two core components.
arXiv Detail & Related papers (2025-12-19T07:27:19Z) - Predictive Feature Caching for Training-free Acceleration of Molecular Geometry Generation [67.20779609022108]
Flow matching models generate high-fidelity molecular geometries but incur significant computational costs during inference.<n>This work discusses a training-free caching strategy that accelerates molecular geometry generation.<n> Experiments on the GEOM-Drugs dataset demonstrate that caching achieves a twofold reduction in wall-clock inference time.
arXiv Detail & Related papers (2025-10-06T09:49:14Z) - ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion [30.897215456167753]
Diffusion models suffer from substantial computational overhead due to their inherently iterative inference process.<n>We propose ERTACache, a principled caching framework that jointly rectifies both error types.<n>ERTACache achieves up to 2x inference speedup while consistently preserving or even improving visual quality.
arXiv Detail & Related papers (2025-08-27T10:37:24Z) - ODE$_t$(ODE$_l$): Shortcutting the Time and Length in Diffusion and Flow Models for Faster Sampling [33.87434194582367]
In this work, we explore a complementary direction in which the quality-complexity tradeoff can be dynamically controlled.<n>We employ time- and length-wise consistency terms during flow matching training, and as a result, the sampling can be performed with an arbitrary number of time steps.<n>Compared to the previous state of the art, image generation experiments on CelebA-HQ and ImageNet show a latency reduction of up to 3$times$ in the most efficient sampling mode.
arXiv Detail & Related papers (2025-06-26T18:59:59Z) - MagCache: Fast Video Generation with Magnitude-Aware Cache [91.2771453279713]
We introduce a novel and robust discovery: a unified magnitude law observed across different models and prompts.<n>We introduce a Magnitude-aware Cache (MagCache) that adaptively skips unimportant timesteps using an error modeling mechanism and adaptive caching strategy.<n> Experimental results show that MagCache achieves 2.10x-2.68x speedups on Open-Sora, CogVideoX, Wan 2.1, and HunyuanVideo, while preserving superior visual fidelity.
arXiv Detail & Related papers (2025-06-10T17:59:02Z) - Neural Flow Samplers with Shortcut Models [19.81513273510523]
Continuous flow-based neural samplers offer a promising approach to generate samples from unnormalized densities.<n>We introduce an improved estimator for these challenging quantities, employing a velocity-driven Sequential Monte Carlo method.<n>Our proposed Neural Flow Shortcut Sampler empirically outperforms existing flow-based neural samplers on both synthetic datasets and complex n-body system targets.
arXiv Detail & Related papers (2025-02-11T07:55:41Z) - Self-Refining Diffusion Samplers: Enabling Parallelization via Parareal Iterations [53.180374639531145]
Self-Refining Diffusion Samplers (SRDS) retain sample quality and can improve latency at the cost of additional parallel compute.<n>We take inspiration from the Parareal algorithm, a popular numerical method for parallel-in-time integration of differential equations.
arXiv Detail & Related papers (2024-12-11T11:08:09Z) - Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters.
We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers.
Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration [63.15453821022452]
Recent developments in approaches based on deep learning have achieved sub-second runtimes for DiffIR.
We propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields.
We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields.
arXiv Detail & Related papers (2021-09-26T19:56:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.