LiteAttention: A Temporal Sparse Attention for Diffusion Transformers
- URL: http://arxiv.org/abs/2511.11062v1
- Date: Fri, 14 Nov 2025 08:26:55 GMT
- Title: LiteAttention: A Temporal Sparse Attention for Diffusion Transformers
- Authors: Dor Shmilovich, Tony Wu, Aviad Dahan, Yuval Domb,
- Abstract summary: LiteAttention exploits temporal coherence to enable evolutionary computation skips across the denoising sequence.<n>We implement a highly optimized LiteAttention kernel on top of FlashAttention and demonstrate substantial speedups on production video diffusion models.
- Score: 1.3471268811218626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion Transformers, particularly for video generation, achieve remarkable quality but suffer from quadratic attention complexity, leading to prohibitive latency. Existing acceleration methods face a fundamental trade-off: dynamically estimating sparse attention patterns at each denoising step incurs high computational overhead and estimation errors, while static sparsity patterns remain fixed and often suboptimal throughout denoising. We identify a key structural property of diffusion attention, namely, its sparsity patterns exhibit strong temporal coherence across denoising steps. Tiles deemed non-essential at step $t$ typically remain so at step $t+δ$. Leveraging this observation, we introduce LiteAttention, a method that exploits temporal coherence to enable evolutionary computation skips across the denoising sequence. By marking non-essential tiles early and propagating skip decisions forward, LiteAttention eliminates redundant attention computations without repeated profiling overheads, combining the adaptivity of dynamic methods with the efficiency of static ones. We implement a highly optimized LiteAttention kernel on top of FlashAttention and demonstrate substantial speedups on production video diffusion models, with no degradation in quality. The code and implementation details will be publicly released.
Related papers
- Predict to Skip: Linear Multistep Feature Forecasting for Efficient Diffusion Transformers [10.751183015853863]
Diffusion Transformers (DiT) have emerged as a widely adopted backbone for high-fidelity image and video generation.<n>We propose textbfPrediT, a training-free acceleration framework that formulates feature prediction as a linear multistep problem.<n>Our method achieves up to $5.54times$ latency reduction across various DiT-based image and video generation models, while incurring negligible quality degradation.
arXiv Detail & Related papers (2026-02-20T09:33:59Z) - SparseD: Sparse Attention for Diffusion Language Models [98.05780626106555]
diffusion language models (DLMs) offer a promising alternative to autoregressive models (ARs)<n>Existing open-source DLMs suffer from high inference latency.<n>We propose SparseD, a novel sparse attention method for DLMs.
arXiv Detail & Related papers (2025-09-28T18:10:10Z) - Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation [50.04968365065964]
Diffusion-based talking head models generate high-quality, photorealistic videos but suffer from slow inference.<n>We introduce Lightning-fast Caching-based Parallel denoising prediction (LightningCP)<n>We also propose Decoupled Foreground Attention (DFA) to further accelerate attention computations.
arXiv Detail & Related papers (2025-08-25T02:58:39Z) - Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment [14.097906894386066]
PostDiff is a training-free framework for accelerating pre-trained diffusion models.<n>We show that PostDiff can significantly improve the fidelity-efficiency trade-off of state-of-the-art diffusion models.
arXiv Detail & Related papers (2025-08-08T09:29:37Z) - Sortblock: Similarity-Aware Feature Reuse for Diffusion Model [9.749736545966694]
Diffusion Transformers (DiTs) have demonstrated remarkable generative capabilities.<n>DiTs' sequential denoising process results in high inference latency.<n>We propose Sortblock, a training-free inference acceleration framework.
arXiv Detail & Related papers (2025-08-01T08:10:54Z) - Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion [67.94300151774085]
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models.<n>It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs.
arXiv Detail & Related papers (2025-06-09T17:59:55Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Training-free and Adaptive Sparse Attention for Efficient Long Video Generation [31.615453637053793]
generating high-fidelity long videos with Diffusion Transformers (DiTs) is often hindered by significant latency.<n>We propose AdaSpa, the first Dynamic Pattern and Online Precise Search sparse attention method.<n>AdaSpa is implemented as an adaptive, plug-and-play solution and can be integrated seamlessly with existing DiTs.
arXiv Detail & Related papers (2025-02-28T14:11:20Z) - Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators [83.48423407316713]
We present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately.
Our model initiates the denoising process with a precise, non-ambiguous stage and gradually transitions to a phase enriched with detail.
Our method achieves a state-of-the-art FID score of 2.01 when integrated with the recent work SiT.
arXiv Detail & Related papers (2024-08-11T07:01:39Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner [112.99126045081046]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.<n>We propose a textbftimestep tuner that helps find a more accurate integral direction for a particular interval at the minimum cost.<n>Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.