Related papers: Ditto: Accelerating Diffusion Model via Temporal Value Similarity

Ditto: Accelerating Diffusion Model via Temporal Value Similarity

URL: http://arxiv.org/abs/2501.11211v1
Date: Mon, 20 Jan 2025 01:03:50 GMT
Title: Ditto: Accelerating Diffusion Model via Temporal Value Similarity
Authors: Sungbin Kim, Hyunwuk Lee, Wonho Cho, Mincheol Park, Won Woo Ro,
Abstract summary: We propose a difference processing algorithm that leverages temporal similarity with quantization to enhance the efficiency of diffusion models.<n>We also design the Ditto hardware, a specialized hardware accelerator, which achieves up to 1.5x speedup and 17.74% energy saving.
Score: 4.5280087047319535
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Diffusion models achieve superior performance in image generation tasks. However, it incurs significant computation overheads due to its iterative structure. To address these overheads, we analyze this iterative structure and observe that adjacent time steps in diffusion models exhibit high value similarity, leading to narrower differences between consecutive time steps. We adapt these characteristics to a quantized diffusion model and reveal that the majority of these differences can be represented with reduced bit-width, and even zero. Based on our observations, we propose the Ditto algorithm, a difference processing algorithm that leverages temporal similarity with quantization to enhance the efficiency of diffusion models. By exploiting the narrower differences and the distributive property of layer operations, it performs full bit-width operations for the initial time step and processes subsequent steps with temporal differences. In addition, Ditto execution flow optimization is designed to mitigate the memory overhead of temporal difference processing, further boosting the efficiency of the Ditto algorithm. We also design the Ditto hardware, a specialized hardware accelerator, fully exploiting the dynamic characteristics of the proposed algorithm. As a result, the Ditto hardware achieves up to 1.5x speedup and 17.74% energy saving compared to other accelerators.

Related papers

CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models [5.406829638216823]
Diffusion models have revolutionized generative tasks, especially in the domain of text-to-image synthesis. However, their iterative denoising process demands substantial computational resources. We present a novel acceleration strategy that integrates token-level pruning with caching techniques to tackle this computational challenge.
arXiv Detail & Related papers (2025-02-01T13:46:02Z)
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models [12.931893842093718]
We present EXION, the first SW-HW co-designed diffusion accelerator.<n>It exploits the unique inter- and intra-iteration output sparsity in diffusion models.<n>It achieves dramatic improvements in performance and energy efficiency by 3.2-379.3x and 45.1-3067.6x compared to a server GPU and by 42.6-1090.9x and 196.9-4668.2x compared to an edge GPU.
arXiv Detail & Related papers (2025-01-10T03:07:28Z)
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers [79.07412045476872]
Diffusion Transformers have emerged as the preeminent models for a wide array of generative tasks.<n>We show that performing the full of the model at each diffusion step is unnecessary, as some computations can be skipped by lazily reusing the results of previous steps.<n>We propose a lazy learning framework that efficiently leverages cached results from earlier steps to skip redundant computations.
arXiv Detail & Related papers (2024-12-17T01:12:35Z)
Accelerating AI Performance using Anderson Extrapolation on GPUs [2.114333871769023]
We present a novel approach for accelerating AI performance by leveraging Anderson extrapolation. By identifying the crossover point where a mixing penalty is incurred, the method focuses on reducing iterations to convergence. We demonstrate significant improvements in both training and inference, motivated by scalability and efficiency extensions to the realm of high-performance computing.
arXiv Detail & Related papers (2024-10-25T10:45:17Z)
Dynamic Diffusion Transformer [67.13876021157887]
Diffusion Transformer (DiT) has demonstrated superior performance but suffers from substantial computational costs. We propose Dynamic Diffusion Transformer (DyDiT), an architecture that dynamically adjusts its computation along both timestep and spatial dimensions during generation. With 3% additional fine-tuning, our method reduces the FLOPs of DiT-XL by 51%, accelerates generation by 1.73, and achieves a competitive FID score of 2.07 on ImageNet.
arXiv Detail & Related papers (2024-10-04T14:14:28Z)
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters. We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z)
ReNoise: Real Image Inversion Through Iterative Noising [62.96073631599749]
We introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations. We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models.
arXiv Detail & Related papers (2024-03-21T17:52:08Z)
Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features. We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps. We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z)
Clockwork Diffusion: Efficient Generation With Model-Step Distillation [42.01130983628078]
Clockwork Diffusion is a method that periodically reuses computation from preceding denoising steps to approximate low-res feature maps at one or more subsequent steps. For both text-to-image generation and image editing, we demonstrate that Clockwork leads to comparable or improved perceptual scores with drastically reduced computational complexity.
arXiv Detail & Related papers (2023-12-13T13:30:27Z)
The Missing U for Efficient Diffusion Models [3.712196074875643]
Diffusion Probabilistic Models yield record-breaking performance in tasks such as image synthesis, video generation, and molecule design. Despite their capabilities, their efficiency, especially in the reverse process, remains a challenge due to slow convergence rates and high computational costs. We introduce an approach that leverages continuous dynamical systems to design a novel denoising network for diffusion models.
arXiv Detail & Related papers (2023-10-31T00:12:14Z)
Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost. Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.