Related papers: InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models

InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models

URL: http://arxiv.org/abs/2512.05134v1
Date: Sat, 29 Nov 2025 02:34:23 GMT
Title: InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models
Authors: Zihao Wu,
Abstract summary: InvarDiff is a training-free acceleration method that exploits the relative temporal invariance across timestep-scale and layer-scale.<n> Experiments show that InvarDiff achieves $2$-$3times$ end-to-end speed-ups with minimal impact on standard quality metrics.
Score: 2.6735992385049663
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models deliver high-fidelity synthesis but remain slow due to iterative sampling. We empirically observe there exists feature invariance in deterministic sampling, and present InvarDiff, a training-free acceleration method that exploits the relative temporal invariance across timestep-scale and layer-scale. From a few deterministic runs, we compute a per-timestep, per-layer, per-module binary cache plan matrix and use a re-sampling correction to avoid drift when consecutive caches occur. Using quantile-based change metrics, this matrix specifies which module at which step is reused rather than recomputed. The same invariance criterion is applied at the step scale to enable cross-timestep caching, deciding whether an entire step can reuse cached results. During inference, InvarDiff performs step-first and layer-wise caching guided by this matrix. When applied to DiT and FLUX, our approach reduces redundant compute while preserving fidelity. Experiments show that InvarDiff achieves $2$-$3\times$ end-to-end speed-ups with minimal impact on standard quality metrics. Qualitatively, we observe almost no degradation in visual quality compared with full computations.

Related papers

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching [75.02865981328509]
Caching reduces computation by reusing previously computed model outputs across timesteps.<n>We propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis.<n>SenCache achieves better visual quality than existing caching methods under similar computational budgets.
arXiv Detail & Related papers (2026-02-27T17:36:09Z)
Multiscale replay: A robust algorithm for stochastic variational inequalities with a Markovian buffer [10.836971562948042]
We introduce the Multiscale Experience Replay (MER) algorithm for solving a class of variational inequalities (VIs)<n>Rather than uniformly sampling from the buffer, MER utilizes a multi-scale sampling scheme to emulate the behavior of VI algorithms designed for independent and identically distributed samples.
arXiv Detail & Related papers (2026-01-04T12:05:48Z)
ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration [14.306565517230775]
Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment.<n>Existing methods suffer from two key limitations: (1) uniform caching intervals fail to align with the non-uniform temporal dynamics of DiT, and (2) naive feature reuse with excessively large caching intervals can lead to severe error accumulation.<n>We propose ProCache, a training-free dynamic feature caching framework that addresses these issues via two core components.
arXiv Detail & Related papers (2025-12-19T07:27:19Z)
Predictive Feature Caching for Training-free Acceleration of Molecular Geometry Generation [67.20779609022108]
Flow matching models generate high-fidelity molecular geometries but incur significant computational costs during inference.<n>This work discusses a training-free caching strategy that accelerates molecular geometry generation.<n> Experiments on the GEOM-Drugs dataset demonstrate that caching achieves a twofold reduction in wall-clock inference time.
arXiv Detail & Related papers (2025-10-06T09:49:14Z)
ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion [30.897215456167753]
Diffusion models suffer from substantial computational overhead due to their inherently iterative inference process.<n>We propose ERTACache, a principled caching framework that jointly rectifies both error types.<n>ERTACache achieves up to 2x inference speedup while consistently preserving or even improving visual quality.
arXiv Detail & Related papers (2025-08-27T10:37:24Z)
ODE$_t$(ODE$_l$): Shortcutting the Time and Length in Diffusion and Flow Models for Faster Sampling [33.87434194582367]
In this work, we explore a complementary direction in which the quality-complexity tradeoff can be dynamically controlled.<n>We employ time- and length-wise consistency terms during flow matching training, and as a result, the sampling can be performed with an arbitrary number of time steps.<n>Compared to the previous state of the art, image generation experiments on CelebA-HQ and ImageNet show a latency reduction of up to 3$times$ in the most efficient sampling mode.
arXiv Detail & Related papers (2025-06-26T18:59:59Z)
MagCache: Fast Video Generation with Magnitude-Aware Cache [91.2771453279713]
We introduce a novel and robust discovery: a unified magnitude law observed across different models and prompts.<n>We introduce a Magnitude-aware Cache (MagCache) that adaptively skips unimportant timesteps using an error modeling mechanism and adaptive caching strategy.<n> Experimental results show that MagCache achieves 2.10x-2.68x speedups on Open-Sora, CogVideoX, Wan 2.1, and HunyuanVideo, while preserving superior visual fidelity.
arXiv Detail & Related papers (2025-06-10T17:59:02Z)
Neural Flow Samplers with Shortcut Models [19.81513273510523]
Continuous flow-based neural samplers offer a promising approach to generate samples from unnormalized densities.<n>We introduce an improved estimator for these challenging quantities, employing a velocity-driven Sequential Monte Carlo method.<n>Our proposed Neural Flow Shortcut Sampler empirically outperforms existing flow-based neural samplers on both synthetic datasets and complex n-body system targets.
arXiv Detail & Related papers (2025-02-11T07:55:41Z)
Self-Refining Diffusion Samplers: Enabling Parallelization via Parareal Iterations [53.180374639531145]
Self-Refining Diffusion Samplers (SRDS) retain sample quality and can improve latency at the cost of additional parallel compute.<n>We take inspiration from the Parareal algorithm, a popular numerical method for parallel-in-time integration of differential equations.
arXiv Detail & Related papers (2024-12-11T11:08:09Z)
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters. We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z)
Low-rank extended Kalman filtering for online learning of neural networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream. The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix. In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z)
Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration [63.15453821022452]
Recent developments in approaches based on deep learning have achieved sub-second runtimes for DiffIR. We propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields. We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields.
arXiv Detail & Related papers (2021-09-26T19:56:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.