DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing
- URL: http://arxiv.org/abs/2409.07756v1
- Date: Thu, 12 Sep 2024 05:18:57 GMT
- Title: DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing
- Authors: Zhenyuan Dong, Sai Qian Zhang,
- Abstract summary: We propose a data-free post-training quantization (PTQ) method for efficient Diffusion Transformers (DiTs)
DiTAS relies on the proposed temporal-aggregated smoothing techniques to mitigate the impact of the channel-wise outliers within the input activations.
We show that our approach enables 4-bit weight, 8-bit activation (W4A8) quantization for DiTs while maintaining comparable performance as the full-precision model.
- Score: 5.174900115018253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion Transformers (DiTs) have recently attracted significant interest from both industry and academia due to their enhanced capabilities in visual generation, surpassing the performance of traditional diffusion models that employ U-Net. However, the improved performance of DiTs comes at the expense of higher parameter counts and implementation costs, which significantly limits their deployment on resource-constrained devices like mobile phones. We propose DiTAS, a data-free post-training quantization (PTQ) method for efficient DiT inference. DiTAS relies on the proposed temporal-aggregated smoothing techniques to mitigate the impact of the channel-wise outliers within the input activations, leading to much lower quantization error under extremely low bitwidth. To further enhance the performance of the quantized DiT, we adopt the layer-wise grid search strategy to optimize the smoothing factor. Experimental results demonstrate that our approach enables 4-bit weight, 8-bit activation (W4A8) quantization for DiTs while maintaining comparable performance as the full-precision model.
Related papers
- Dynamic Diffusion Transformer [67.13876021157887]
Diffusion Transformer (DiT) has demonstrated superior performance but suffers from substantial computational costs.
We propose Dynamic Diffusion Transformer (DyDiT), an architecture that dynamically adjusts its computation along both timestep and spatial dimensions during generation.
With 3% additional fine-tuning, our method reduces the FLOPs of DiT-XL by 51%, accelerates generation by 1.73, and achieves a competitive FID score of 2.07 on ImageNet.
arXiv Detail & Related papers (2024-10-04T14:14:28Z) - Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers [45.762142897697366]
Post-training Quantization (PTQ) offers a promising solution by compressing model sizes and speeding up inference for the pretrained models while eliminating model retraining.
We have observed the existing PTQ frameworks exclusively designed for both ViT and conventional Diffusion models fall into biased quantization and result in remarkable performance degradation.
We devise Q-DiT, which seamlessly integrates three techniques: fine-grained quantization to manage substantial variance across input channels of weights and activations, an automatic search strategy to optimize the quantization granularity and mitigate redundancies, and dynamic activation quantization to capture the activation changes across timesteps.
arXiv Detail & Related papers (2024-06-25T07:57:27Z) - An Analysis on Quantizing Diffusion Transformers [19.520194468481655]
Post Training Quantization (PTQ) offers an immediate remedy for a smaller storage size and more memory-efficient computation during inferencing.
We propose a single-step sampling calibration on activations and adapt group-wise quantization on weights for low-bit quantization.
arXiv Detail & Related papers (2024-06-16T23:18:35Z) - HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization [10.307268005739202]
Diffusion Transformers (DiTs) have recently gained substantial attention for their superior visual generation capabilities.
DiTs also come with high parameter counts and implementation costs, seriously restricting their use on resource-limited devices such as mobile phones.
We introduce the Hybrid Floating-point Quantization for DiT(HQ-DiT), an efficient post-training quantization method that utilizes 4-bit floating-point (FP) precision on both weights and activations for DiT inference.
arXiv Detail & Related papers (2024-05-30T06:56:11Z) - PTQ4DiT: Post-training Quantization for Diffusion Transformers [52.902071948957186]
Post-training Quantization (PTQ) has emerged as a fast and data-efficient solution that can significantly reduce computation and memory footprint.
We propose PTQ4DiT, a specifically designed PTQ method for DiTs.
We demonstrate that our PTQ4DiT successfully quantizes DiTs to 8-bit precision while preserving comparable generation ability.
arXiv Detail & Related papers (2024-05-25T02:02:08Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning [52.157939524815866]
In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods.
We identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width.
Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings.
arXiv Detail & Related papers (2024-02-06T03:39:44Z) - EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models [21.17675493267517]
Post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches to compress and accelerate diffusion models.
We introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.
Our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency.
arXiv Detail & Related papers (2023-10-05T02:51:53Z) - Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks.
We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture.
We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.