AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers
- URL: http://arxiv.org/abs/2602.13357v1
- Date: Fri, 13 Feb 2026 08:11:54 GMT
- Title: AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers
- Authors: Dong Liu, Yanxuan Yu, Ben Lengerich, Ying Nian Wu,
- Abstract summary: Transformer Diffusion (TDis) achieve state-of-the-art in high-fidelity and image generation but suffer from expensive inference due to their iterative denoising.<n>AdaCorrection is an adaptive offset cache correction framework that maintains high generation fidelity while enabling efficient reuse across cache layers during diffusion inference.<n>Our approach achieves strong generation quality with minimal computational overhead, maintaining near-original FID while providing moderate acceleration.
- Score: 37.38708392928324
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion Transformers (DiTs) achieve state-of-the-art performance in high-fidelity image and video generation but suffer from expensive inference due to their iterative denoising structure. While prior methods accelerate sampling by caching intermediate features, they rely on static reuse schedules or coarse-grained heuristics, which often lead to temporal drift and cache misalignment that significantly degrade generation quality. We introduce \textbf{AdaCorrection}, an adaptive offset cache correction framework that maintains high generation fidelity while enabling efficient cache reuse across Transformer layers during diffusion inference. At each timestep, AdaCorrection estimates cache validity with lightweight spatio-temporal signals and adaptively blends cached and fresh activations. This correction is computed on-the-fly without additional supervision or retraining. Our approach achieves strong generation quality with minimal computational overhead, maintaining near-original FID while providing moderate acceleration. Experiments on image and video diffusion benchmarks show that AdaCorrection consistently improves generation performance.
Related papers
- SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching [75.02865981328509]
Caching reduces computation by reusing previously computed model outputs across timesteps.<n>We propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis.<n>SenCache achieves better visual quality than existing caching methods under similar computational budgets.
arXiv Detail & Related papers (2026-02-27T17:36:09Z) - BADiff: Bandwidth Adaptive Diffusion Model [55.10134744772338]
Traditional diffusion models produce high-fidelity images by performing a fixed number of denoising steps, regardless of downstream transmission limitations.<n>In practical cloud-to-device scenarios, limited bandwidth often necessitates heavy compression, leading to loss of fine textures and wasted computation.<n>We introduce a joint end-to-end training strategy where the diffusion model is conditioned on a target quality level derived from the available bandwidth.
arXiv Detail & Related papers (2025-10-24T11:50:03Z) - ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion [30.897215456167753]
Diffusion models suffer from substantial computational overhead due to their inherently iterative inference process.<n>We propose ERTACache, a principled caching framework that jointly rectifies both error types.<n>ERTACache achieves up to 2x inference speedup while consistently preserving or even improving visual quality.
arXiv Detail & Related papers (2025-08-27T10:37:24Z) - TaoCache: Structure-Maintained Video Generation Acceleration [4.594224594572109]
We present TaoCache, a training-free, plug-and-play caching strategy for video diffusion models.<n>It adopts a fixed-point perspective to predict the model's noise output and is specifically effective in late denoising stages.
arXiv Detail & Related papers (2025-08-12T14:40:36Z) - DiTVR: Zero-Shot Diffusion Transformer for Video Restoration [48.97196894658511]
DiTVR is a zero shot video restoration framework that couples a diffusion transformer with trajectory aware attention and a flow consistent sampler.<n>Our attention mechanism aligns tokens along optical flow trajectories, with particular emphasis on vital layers that exhibit the highest sensitivity to temporal dynamics.<n>The flow guided sampler injects data consistency only into low-frequency bands, preserving high frequency priors while accelerating cache.
arXiv Detail & Related papers (2025-08-11T09:54:45Z) - Sortblock: Similarity-Aware Feature Reuse for Diffusion Model [9.749736545966694]
Diffusion Transformers (DiTs) have demonstrated remarkable generative capabilities.<n>DiTs' sequential denoising process results in high inference latency.<n>We propose Sortblock, a training-free inference acceleration framework.
arXiv Detail & Related papers (2025-08-01T08:10:54Z) - Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion [67.94300151774085]
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models.<n>It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs.
arXiv Detail & Related papers (2025-06-09T17:59:55Z) - FEB-Cache: Frequency-Guided Exposure Bias Reduction for Enhancing Diffusion Transformer Caching [10.760030872557374]
Diffusion Transformer (DiT) has exhibited impressive generation capabilities but faces great challenges due to its high computational complexity.<n>In this paper, we first confirm that the cache greatly amplifies the exposure bias, resulting in a decline in the generation quality.<n>We introduce FEB-Cache, a joint caching strategy that aligns with the non-exposed bias diffusion process.
arXiv Detail & Related papers (2025-03-10T09:49:18Z) - QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [84.91431271257437]
Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation.<n>DiTs come with significant drawbacks, including increased computational and memory costs.<n>We propose QuantCache, a novel training-free inference acceleration framework.
arXiv Detail & Related papers (2025-03-09T10:31:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.