Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
- URL: http://arxiv.org/abs/2406.17343v2
- Date: Tue, 19 Nov 2024 09:58:07 GMT
- Title: Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
- Authors: Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu,
- Abstract summary: Post-Training Quantization (PTQ) emerges as a promising solution, enabling model compression and accelerated inference for pretrained models.
Research on DiT quantization remains sparse, and existing PTQ frameworks tend to suffer from biased quantization, leading to notable performance degradation.
We propose Q-DiT, a novel approach that seamlessly integrates two key techniques: automatic quantization granularity allocation to handle the significant variance of weights and activations across input channels, and sample-wise dynamic activation quantization to adaptively capture activation changes across both timesteps and samples.
- Score: 45.762142897697366
- License:
- Abstract: Recent advancements in diffusion models, particularly the architectural transformation from UNet-based models to Diffusion Transformers (DiTs), significantly improve the quality and scalability of image and video generation. However, despite their impressive capabilities, the substantial computational costs of these large-scale models pose significant challenges for real-world deployment. Post-Training Quantization (PTQ) emerges as a promising solution, enabling model compression and accelerated inference for pretrained models, without the costly retraining. However, research on DiT quantization remains sparse, and existing PTQ frameworks, primarily designed for traditional diffusion models, tend to suffer from biased quantization, leading to notable performance degradation. In this work, we identify that DiTs typically exhibit significant spatial variance in both weights and activations, along with temporal variance in activations. To address these issues, we propose Q-DiT, a novel approach that seamlessly integrates two key techniques: automatic quantization granularity allocation to handle the significant variance of weights and activations across input channels, and sample-wise dynamic activation quantization to adaptively capture activation changes across both timesteps and samples. Extensive experiments conducted on ImageNet and VBench demonstrate the effectiveness of the proposed Q-DiT. Specifically, when quantizing DiT-XL/2 to W6A8 on ImageNet ($256 \times 256$), Q-DiT achieves a remarkable reduction in FID by 1.09 compared to the baseline. Under the more challenging W4A8 setting, it maintains high fidelity in image and video generation, establishing a new benchmark for efficient, high-quality quantization in DiTs. Code is available at \href{https://github.com/Juanerx/Q-DiT}{https://github.com/Juanerx/Q-DiT}.
Related papers
- PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution [87.89013794655207]
Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps.
We propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR.
Our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR.
arXiv Detail & Related papers (2024-11-26T04:49:42Z) - DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers [2.0862654518798034]
We propose a Distribution-Friendly and Outlier-Aware Post-training Quantization method for Vision Transformers.
DopQ-ViT analyzes the inefficiencies of current quantizers and introduces a distribution-friendly Tan Quantizer called TanQ.
DopQ-ViT has been extensively validated and significantly improves the performance of quantization models.
arXiv Detail & Related papers (2024-08-06T16:40:04Z) - PTQ4DiT: Post-training Quantization for Diffusion Transformers [52.902071948957186]
Post-training Quantization (PTQ) has emerged as a fast and data-efficient solution that can significantly reduce computation and memory footprint.
We propose PTQ4DiT, a specifically designed PTQ method for DiTs.
We demonstrate that our PTQ4DiT successfully quantizes DiTs to 8-bit precision while preserving comparable generation ability.
arXiv Detail & Related papers (2024-05-25T02:02:08Z) - Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing [49.800746112114375]
We propose a novel post-training quantization method (Progressive and Relaxing) for text-to-image diffusion models.
We are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
arXiv Detail & Related papers (2023-11-10T09:10:09Z) - EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models [21.17675493267517]
Post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches to compress and accelerate diffusion models.
We introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.
Our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency.
arXiv Detail & Related papers (2023-10-05T02:51:53Z) - Temporal Dynamic Quantization for Diffusion Models [18.184163233551292]
We introduce a novel quantization method that dynamically adjusts the quantization interval based on time step information.
Unlike conventional dynamic quantization techniques, our approach has no computational overhead during inference.
Our experiments demonstrate substantial improvements in output quality with the quantized diffusion model across various datasets.
arXiv Detail & Related papers (2023-06-04T09:49:43Z) - Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks.
We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture.
We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z) - RepQ-ViT: Scale Reparameterization for Post-Training Quantization of
Vision Transformers [2.114921680609289]
We propose RepQ-ViT, a novel PTQ framework for vision transformers (ViTs)
RepQ-ViT decouples the quantization and inference processes.
It can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.
arXiv Detail & Related papers (2022-12-16T02:52:37Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.