Sortblock: Similarity-Aware Feature Reuse for Diffusion Model
- URL: http://arxiv.org/abs/2508.00412v1
- Date: Fri, 01 Aug 2025 08:10:54 GMT
- Title: Sortblock: Similarity-Aware Feature Reuse for Diffusion Model
- Authors: Hanqi Chen, Xu Zhang, Xiaoliu Guan, Lielin Jiang, Guanzhong Wang, Zeyu Chen, Yi Liu,
- Abstract summary: Diffusion Transformers (DiTs) have demonstrated remarkable generative capabilities.<n>DiTs' sequential denoising process results in high inference latency.<n>We propose Sortblock, a training-free inference acceleration framework.
- Score: 9.749736545966694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion Transformers (DiTs) have demonstrated remarkable generative capabilities, particularly benefiting from Transformer architectures that enhance visual and artistic fidelity. However, their inherently sequential denoising process results in high inference latency, limiting their deployment in real-time scenarios. Existing training-free acceleration approaches typically reuse intermediate features at fixed timesteps or layers, overlooking the evolving semantic focus across denoising stages and Transformer blocks.To address this, we propose Sortblock, a training-free inference acceleration framework that dynamically caches block-wise features based on their similarity across adjacent timesteps. By ranking the evolution of residuals, Sortblock adaptively determines a recomputation ratio, selectively skipping redundant computations while preserving generation quality. Furthermore, we incorporate a lightweight linear prediction mechanism to reduce accumulated errors in skipped blocks.Extensive experiments across various tasks and DiT architectures demonstrate that Sortblock achieves over 2$\times$ inference speedup with minimal degradation in output quality, offering an effective and generalizable solution for accelerating diffusion-based generative models.
Related papers
- Block-wise Adaptive Caching for Accelerating Diffusion Policy [10.641633189595302]
Block-wise Adaptive Caching(BAC) is a method to accelerate Diffusion Policy by caching intermediate action features.<n>BAC achieves up to 3x inference speedup for free on robotic benchmarks.
arXiv Detail & Related papers (2025-06-16T13:14:58Z) - Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion [70.4360995984905]
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models.<n>It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs.
arXiv Detail & Related papers (2025-06-09T17:59:55Z) - ALTER: All-in-One Layer Pruning and Temporal Expert Routing for Efficient Diffusion Generation [40.68265817413368]
We introduce ALTER: All-in-One Layer Pruning and Temporal Expert Routing.<n>A unified framework that transforms diffusion models into a mixture of efficient temporal experts.<n>A single-stage optimization that unifies layer pruning, expert routing, and model fine-tuning by employing a trainable hypernetwork.
arXiv Detail & Related papers (2025-05-27T22:59:44Z) - Fully Spiking Neural Networks for Unified Frame-Event Object Tracking [11.727693745877486]
Spiking Frame-Event Tracking framework is proposed to fuse frame and event data.<n> RPM eliminates positional bias randomized spatial reorganization and learnable type encoding.<n>STR strategy enforces temporal consistency among template features in latent space.
arXiv Detail & Related papers (2025-05-27T07:53:50Z) - Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration [24.85624444212476]
This study proposes a novel concept: attend to prune feature redundancies in areas not attended by the diffusion process.<n>We analyze the location and degree of feature redundancies based on the structure-then-detail denoising priors.<n>We introduce SDTM, a structure-then-detail token merging approach that dynamically compresses feature redundancies.
arXiv Detail & Related papers (2025-05-16T21:27:38Z) - BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers [39.08730113749482]
Diffusion Transformers (DiTs) continue to encounter challenges related to low inference speed.<n>We propose BlockDance, a training-free approach that explores feature similarities at adjacent time steps to accelerate DiTs.<n>We also introduce BlockDance-Ada, a lightweight decision-making network tailored for instance-specific acceleration.
arXiv Detail & Related papers (2025-03-20T08:07:31Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model.<n>We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch.<n>Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z) - Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints [51.83081671798784]
Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability.<n>DiT's practical application suffers from inherent dynamic feature instability, leading to error amplification during cached inference.<n>We propose Skip-DiT, an image and video generative DiT variant enhanced with Long-Skip-Connections (LSCs) - the key efficiency component in U-Nets.
arXiv Detail & Related papers (2024-11-26T17:28:10Z) - Temporal Feature Matters: A Framework for Diffusion Model Quantization [105.3033493564844]
Diffusion models rely on the time-step for the multi-round denoising.<n>We introduce a novel quantization framework that includes three strategies.<n>This framework preserves most of the temporal information and ensures high-quality end-to-end generation.
arXiv Detail & Related papers (2024-07-28T17:46:15Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.