Token Pruning for Caching Better: 9 Times Acceleration on Stable Diffusion for Free
- URL: http://arxiv.org/abs/2501.00375v1
- Date: Tue, 31 Dec 2024 09:56:40 GMT
- Title: Token Pruning for Caching Better: 9 Times Acceleration on Stable Diffusion for Free
- Authors: Evelyn Zhang, Bang Xiao, Jiayi Tang, Qianli Ma, Chang Zou, Xuefei Ning, Xuming Hu, Linfeng Zhang,
- Abstract summary: We introduce a dynamics-aware token pruning (DaTo) approach that addresses the limitations of feature caching.
DaTo combines feature caching with token pruning in a training-free manner, achieving both temporal and token-wise information reuse.
- Score: 36.86246063181059
- License:
- Abstract: Stable Diffusion has achieved remarkable success in the field of text-to-image generation, with its powerful generative capabilities and diverse generation results making a lasting impact. However, its iterative denoising introduces high computational costs and slows generation speed, limiting broader adoption. The community has made numerous efforts to reduce this computational burden, with methods like feature caching attracting attention due to their effectiveness and simplicity. Nonetheless, simply reusing features computed at previous timesteps causes the features across adjacent timesteps to become similar, reducing the dynamics of features over time and ultimately compromising the quality of generated images. In this paper, we introduce a dynamics-aware token pruning (DaTo) approach that addresses the limitations of feature caching. DaTo selectively prunes tokens with lower dynamics, allowing only high-dynamic tokens to participate in self-attention layers, thereby extending feature dynamics across timesteps. DaTo combines feature caching with token pruning in a training-free manner, achieving both temporal and token-wise information reuse. Applied to Stable Diffusion on the ImageNet, our approach delivered a 9$\times$ speedup while reducing FID by 0.33, indicating enhanced image quality. On the COCO-30k, we observed a 7$\times$ acceleration coupled with a notable FID reduction of 2.17.
Related papers
- Accelerating Vision Diffusion Transformers with Skip Branches [47.07564477125228]
Diffusion Transformers (DiT) are an emerging image and video generation model architecture.
DiT's practical deployment is constrained by computational complexity and redundancy in the sequential denoising process.
We introduce Skip-DiT, which converts standard DiT into Skip-DiT with skip branches to enhance feature smoothness.
We also introduce Skip-Cache which utilizes the skip branches to cache DiT features across timesteps at the inference time.
arXiv Detail & Related papers (2024-11-26T17:28:10Z) - Accelerating Diffusion Transformers with Token-wise Feature Caching [19.140800616594294]
Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs.
We introduce token-wise feature caching, allowing us to adaptively select the most suitable tokens for caching.
Experiments on PixArt-$alpha$, OpenSora, and DiT demonstrate our effectiveness in both image and video generation with no requirements for training.
arXiv Detail & Related papers (2024-10-05T03:47:06Z) - HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration [31.982294870690925]
We propose a novel learning-based caching framework dubbed HarmoniCa.
It incorporates Step-Wise Denoising Training (SDT) to ensure the continuity of the denoising process.
It also incorporates an Image Error Proxy-Guided Objective (IEPO) to balance image quality against cache utilization.
arXiv Detail & Related papers (2024-10-02T16:34:29Z) - Token Caching for Diffusion Transformer Acceleration [30.437462937127773]
TokenCache is a novel post-training acceleration method for diffusion transformers.
It reduces redundant computations among tokens across inference steps.
We show that TokenCache achieves an effective trade-off between generation quality and inference speed for diffusion transformers.
arXiv Detail & Related papers (2024-09-27T08:05:34Z) - Efficient Diffusion Model for Image Restoration by Residual Shifting [63.02725947015132]
This study proposes a novel and efficient diffusion model for image restoration.
Our method avoids the need for post-acceleration during inference, thereby avoiding the associated performance deterioration.
Our method achieves superior or comparable performance to current state-of-the-art methods on three classical IR tasks.
arXiv Detail & Related papers (2024-03-12T05:06:07Z) - DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture.
DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z) - No Token Left Behind: Efficient Vision Transformer via Dynamic Token
Idling [55.203866875294516]
Vision Transformers (ViTs) have demonstrated outstanding performance in computer vision tasks.
Various token pruning techniques have been introduced to alleviate the high computational burden of ViTs.
We propose IdleViT, a dynamic token-idle-based method that achieves an excellent trade-off between performance and efficiency.
arXiv Detail & Related papers (2023-10-09T12:10:41Z) - Learn to cycle: Time-consistent feature discovery for action recognition [83.43682368129072]
Generalizing over temporal variations is a prerequisite for effective action recognition in videos.
We introduce Squeeze Re Temporal Gates (SRTG), an approach that favors temporal activations with potential variations.
We show consistent improvement when using SRTPG blocks, with only a minimal increase in the number of GFLOs.
arXiv Detail & Related papers (2020-06-15T09:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.