Related papers: Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization

Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization

URL: http://arxiv.org/abs/2512.23258v1
Date: Mon, 29 Dec 2025 07:36:36 GMT
Title: Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization
Authors: Tong Shao, Yusen Fu, Guoying Sun, Jingde Kong, Zhuotao Tian, Jingyong Su,
Abstract summary: Caching-based methods achieve training-free acceleration, while suffering from considerable computational error.<n>Existing methods typically incorporate error correction strategies such as pruning or prediction to mitigate it.<n>We propose a novel fidelity-optimization plugin for existing error correction methods via cumulative error minimization, named CEM.
Score: 26.687056294842083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although Diffusion Transformer (DiT) has emerged as a predominant architecture for image and video generation, its iterative denoising process results in slow inference, which hinders broader applicability and development. Caching-based methods achieve training-free acceleration, while suffering from considerable computational error. Existing methods typically incorporate error correction strategies such as pruning or prediction to mitigate it. However, their fixed caching strategy fails to adapt to the complex error variations during denoising, which limits the full potential of error correction. To tackle this challenge, we propose a novel fidelity-optimization plugin for existing error correction methods via cumulative error minimization, named CEM. CEM predefines the error to characterize the sensitivity of model to acceleration jointly influenced by timesteps and cache intervals. Guided by this prior, we formulate a dynamic programming algorithm with cumulative error approximation for strategy optimization, which achieves the caching error minimization, resulting in a substantial improvement in generation fidelity. CEM is model-agnostic and exhibits strong generalization, which is adaptable to arbitrary acceleration budgets. It can be seamlessly integrated into existing error correction frameworks and quantized models without introducing any additional computational overhead. Extensive experiments conducted on nine generation models and quantized methods across three tasks demonstrate that CEM significantly improves generation fidelity of existing acceleration models, and outperforms the original generation performance on FLUX.1-dev, PixArt-$α$, StableDiffusion1.5 and Hunyuan. The code will be made publicly available.

Related papers

LFPO: Likelihood-Free Policy Optimization for Masked Diffusion Models [48.68246945083386]
Likelihood-Free Policy Optimization (LFPO) is a native framework that maps the concept of vector field flow matching to the discrete token space.<n>LFPO formulates alignment as geometric velocity rectification, which directly optimize denoising logits via contrastive updates.<n>Experiments demonstrate that LFPO not only outperforms state-of-the-art baselines on code and reasoning benchmarks but also accelerates inference by approximately 20% through reduced diffusion steps.
arXiv Detail & Related papers (2026-03-02T07:42:55Z)
Free Lunch for Stabilizing Rectified Flow Inversion [11.80912018629953]
Rectified-Flow (RF)-based generative models have emerged as strong alternatives to traditional diffusion models.<n>We propose Proximal-Mean Inversion (PMI), a training-free gradient correction method.<n>We also introduce mimic-CFG, a lightweight velocity correction scheme for editing tasks.
arXiv Detail & Related papers (2026-02-12T11:42:36Z)
Test-Time Iterative Error Correction for Efficient Diffusion Models [16.300409397814192]
Iterative Error Correction is a test-time method that mitigates inference-time errors by iteratively refining the model's output.<n>It consistently improves generation quality across various datasets, efficiency techniques, and model architectures.
arXiv Detail & Related papers (2025-11-09T06:29:22Z)
ETC: training-free diffusion models acceleration with Error-aware Trend Consistency [46.40478218579471]
Recent training-free methods accelerate diffusion process by reusing model outputs.<n>These methods ignore denoising trends and lack error control for model-specific tolerance.<n>We introduce Error-aware Trend Consistency (ETC), a framework that leverages the smooth continuity of diffusion trajectories.<n>ETC achieves a 2.65x acceleration over FLUX with negligible degradation of consistency.
arXiv Detail & Related papers (2025-10-28T07:08:09Z)
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models [57.49136894315871]
New paradigm of test-time scaling has yielded remarkable breakthroughs in reasoning models and generative vision models.<n>We propose one solution to the problem of integrating test-time scaling knowledge into a model during post-training.<n>We replace reward guided test-time noise optimization in diffusion models with a Noise Hypernetwork that modulates initial input noise.
arXiv Detail & Related papers (2025-08-13T17:33:37Z)
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition [4.0594792247165]
Diffusion transformer (DiT) models have achieved remarkable success in image generation.<n>We propose increment-calibrated caching, a training-free method for DiT acceleration.<n>Our method eliminates more than 45% and improves IS by 12 at the cost of less than 0.06 FID increase.
arXiv Detail & Related papers (2025-05-09T06:56:17Z)
One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
Gradient Correction in Federated Learning with Adaptive Optimization [19.93709245766609]
We propose tt FAdamGC, the first algorithm to integrate client-drift compensation into adaptive optimization.<n>We show that tt FAdamGC consistently outperforms existing methods in total communication and cost across varying levels of data.
arXiv Detail & Related papers (2025-02-04T21:21:30Z)
Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations. We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems. We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z)
Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine. These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults. We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.