PTQ4DiT: Post-training Quantization for Diffusion Transformers
- URL: http://arxiv.org/abs/2405.16005v3
- Date: Thu, 17 Oct 2024 15:28:15 GMT
- Title: PTQ4DiT: Post-training Quantization for Diffusion Transformers
- Authors: Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, Yan Yan,
- Abstract summary: Post-training Quantization (PTQ) has emerged as a fast and data-efficient solution that can significantly reduce computation and memory footprint.
We propose PTQ4DiT, a specifically designed PTQ method for DiTs.
We demonstrate that our PTQ4DiT successfully quantizes DiTs to 8-bit precision while preserving comparable generation ability.
- Score: 52.902071948957186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent introduction of Diffusion Transformers (DiTs) has demonstrated exceptional capabilities in image generation by using a different backbone architecture, departing from traditional U-Nets and embracing the scalable nature of transformers. Despite their advanced capabilities, the wide deployment of DiTs, particularly for real-time applications, is currently hampered by considerable computational demands at the inference stage. Post-training Quantization (PTQ) has emerged as a fast and data-efficient solution that can significantly reduce computation and memory footprint by using low-bit weights and activations. However, its applicability to DiTs has not yet been explored and faces non-trivial difficulties due to the unique design of DiTs. In this paper, we propose PTQ4DiT, a specifically designed PTQ method for DiTs. We discover two primary quantization challenges inherent in DiTs, notably the presence of salient channels with extreme magnitudes and the temporal variability in distributions of salient activation over multiple timesteps. To tackle these challenges, we propose Channel-wise Salience Balancing (CSB) and Spearmen's $\rho$-guided Salience Calibration (SSC). CSB leverages the complementarity property of channel magnitudes to redistribute the extremes, alleviating quantization errors for both activations and weights. SSC extends this approach by dynamically adjusting the balanced salience to capture the temporal variations in activation. Additionally, to eliminate extra computational costs caused by PTQ4DiT during inference, we design an offline re-parameterization strategy for DiTs. Experiments demonstrate that our PTQ4DiT successfully quantizes DiTs to 8-bit precision (W8A8) while preserving comparable generation ability and further enables effective quantization to 4-bit weight precision (W4A8) for the first time.
Related papers
- FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers [15.324769026957641]
Post-training quantization is a lightweight method to alleviate burdens without the need for training or fine-tuning.
We introduce FP4DiT, a PTQ method that leverages Floating-Point Quantization to achieve W4A6 quantization.
Experimental results demonstrate that FP4DiT outperforms integer-based PTQ at W4A6 and W4A8 precision.
arXiv Detail & Related papers (2025-03-19T17:44:21Z) - Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping [36.012101195384744]
Diffusion Transformer (DiT) has now become the preferred choice for building image generation models.
DiT is purely composed of a stack of transformer blocks, which renders DiT excellent in scalability like large language models.
We propose a post-training quantization framework tailored for Diffusion Transforms to tackle these challenges.
arXiv Detail & Related papers (2025-03-10T05:21:04Z) - TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models [49.65286242048452]
We propose a novel method dubbed Timestep-Channel Adaptive Quantization for Diffusion Models (TCAQ-DM)
The proposed method substantially outperforms the state-of-the-art approaches in most cases.
arXiv Detail & Related papers (2024-12-21T16:57:54Z) - Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints [51.83081671798784]
Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability.
DiT's practical application suffers from inherent dynamic feature instability, leading to error amplification during cached inference.
We propose Skip-DiT, a novel DiT variant enhanced with Long-Skip-Connections (LSCs) - the key efficiency component in U-Nets.
arXiv Detail & Related papers (2024-11-26T17:28:10Z) - DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing [5.174900115018253]
We propose a data-free post-training quantization (PTQ) method for efficient Diffusion Transformers (DiTs)
DiTAS relies on the proposed temporal-aggregated smoothing techniques to mitigate the impact of the channel-wise outliers within the input activations.
We show that our approach enables 4-bit weight, 8-bit activation (W4A8) quantization for DiTs while maintaining comparable performance as the full-precision model.
arXiv Detail & Related papers (2024-09-12T05:18:57Z) - Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers [45.762142897697366]
Post-Training Quantization (PTQ) emerges as a promising solution, enabling model compression and accelerated inference for pretrained models.
Research on DiT quantization remains sparse, and existing PTQ frameworks tend to suffer from biased quantization, leading to notable performance degradation.
We propose Q-DiT, a novel approach that seamlessly integrates two key techniques: automatic quantization granularity allocation to handle the significant variance of weights and activations across input channels, and sample-wise dynamic activation quantization to adaptively capture activation changes across both timesteps and samples.
arXiv Detail & Related papers (2024-06-25T07:57:27Z) - 2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution [83.09117439860607]
Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment.
It is notorious that low-bit quantization degrades the accuracy of SR models compared to their full-precision (FP) counterparts.
We present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization.
arXiv Detail & Related papers (2024-06-10T06:06:11Z) - ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation [23.00085349135532]
Post-Training Quantization (PTQ) is an effective method for reducing memory costs and computational complexity.
We find that applying existing diffusion quantization methods for U-Net faces challenges in preserving quality.
We improve ViDiT-Q with a novel metric-decoupled mixed-precision quantization method (ViDiT-Q-MP)
arXiv Detail & Related papers (2024-06-04T17:57:10Z) - HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization [10.307268005739202]
Diffusion Transformers (DiTs) have recently gained substantial attention for their superior visual generation capabilities.
DiTs also come with high parameter counts and implementation costs, seriously restricting their use on resource-limited devices such as mobile phones.
We introduce the Hybrid Floating-point Quantization for DiT(HQ-DiT), an efficient post-training quantization method that utilizes 4-bit floating-point (FP) precision on both weights and activations for DiT inference.
arXiv Detail & Related papers (2024-05-30T06:56:11Z) - Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization [62.15918574997175]
It is known that language models contain outlier channels whose values on average are orders of magnitude higher than other channels.
We propose a strategy which regularizes a layer's inputs via quantization-aware training (QAT) and its outputs via activation kurtosis regularization.
We show that regularizing both the inputs and outputs is crucial for preventing a model's "migrating" the difficulty in input quantization to the weights.
arXiv Detail & Related papers (2024-04-04T17:25:30Z) - Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks.
We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture.
We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z) - RepQ-ViT: Scale Reparameterization for Post-Training Quantization of
Vision Transformers [2.114921680609289]
We propose RepQ-ViT, a novel PTQ framework for vision transformers (ViTs)
RepQ-ViT decouples the quantization and inference processes.
It can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.
arXiv Detail & Related papers (2022-12-16T02:52:37Z) - PAMS: Quantized Super-Resolution via Parameterized Max Scale [84.55675222525608]
Deep convolutional neural networks (DCNNs) have shown dominant performance in the task of super-resolution (SR)
We propose a new quantization scheme termed PArameterized Max Scale (PAMS), which applies the trainable truncated parameter to explore the upper bound of the quantization range adaptively.
Experiments demonstrate that the proposed PAMS scheme can well compress and accelerate the existing SR models such as EDSR and RDN.
arXiv Detail & Related papers (2020-11-09T06:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.