Related papers: PQCAD-DM: Progressive Quantization and Calibration-Assisted Distillation for Extremely Efficient Diffusion Model

PQCAD-DM: Progressive Quantization and Calibration-Assisted Distillation for Extremely Efficient Diffusion Model

URL: http://arxiv.org/abs/2506.16776v1
Date: Fri, 20 Jun 2025 06:43:27 GMT
Title: PQCAD-DM: Progressive Quantization and Calibration-Assisted Distillation for Extremely Efficient Diffusion Model
Authors: Beomseok Ko, Hyeryung Jang,
Abstract summary: Diffusion models excel in image generation but are computational and resource-intensive.<n>We propose PQCAD-DM, a novel hybrid compression framework combining Progressive Quantization (PQ) and CAD-Assisted Distillation (CAD)<n>PQ employs a two-stage quantization with adaptive bit-width transitions guided by a momentum-based mechanism, reducing excessive weight perturbations in low-precision.
Score: 8.195126516665914
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models excel in image generation but are computational and resource-intensive due to their reliance on iterative Markov chain processes, leading to error accumulation and limiting the effectiveness of naive compression techniques. In this paper, we propose PQCAD-DM, a novel hybrid compression framework combining Progressive Quantization (PQ) and Calibration-Assisted Distillation (CAD) to address these challenges. PQ employs a two-stage quantization with adaptive bit-width transitions guided by a momentum-based mechanism, reducing excessive weight perturbations in low-precision. CAD leverages full-precision calibration datasets during distillation, enabling the student to match full-precision performance even with a quantized teacher. As a result, PQCAD-DM achieves a balance between computational efficiency and generative quality, halving inference time while maintaining competitive performance. Extensive experiments validate PQCAD-DM's superior generative capabilities and efficiency across diverse datasets, outperforming fixed-bit quantization methods.

Related papers

CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training [73.46600457802693]
We introduce a new method that counteracts the loss induced by quantization.<n>CAGE significantly improves upon the state-of-theart methods in terms of accuracy, for similar computational cost.<n>For QAT pre-training of Llama models, CAGE matches the accuracy achieved at 4-bits (W4A4) with the prior best method.
arXiv Detail & Related papers (2025-10-21T16:33:57Z)
Compute-Optimal Quantization-Aware Training [50.98555000360485]
Quantization-aware training (QAT) is a leading technique for improving the accuracy of quantized neural networks.<n>Previous work has shown that decomposing training into a full-precision (FP) phase followed by a QAT phase yields superior accuracy.<n>We investigate how different QAT durations impact final performance.
arXiv Detail & Related papers (2025-09-26T21:09:54Z)
Punching Above Precision: Small Quantized Model Distillation with Learnable Regularizer [9.85847764731154]
Game of Regularizer (GoR) is a learnable regularization method that adaptively balances task-specific (TS) and distillation losses.<n>GoR consistently outperforms state-of-the-art QAT-KD methods on low-power edge devices.<n>We also introduce QAT-EKD-GoR, an ensemble distillation framework that uses multiple heterogeneous teacher models.
arXiv Detail & Related papers (2025-09-25T07:43:13Z)
LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Text-to-Image Generation [34.14174796390669]
Post-training quantization (PTQ) is a promising solution to reduce memory usage and accelerate inference.<n>Existing PTQ methods suffer from severe performance degradation under extreme low-bit settings.<n>We propose LRQ-DiT, an efficient and accurate PTQ framework.
arXiv Detail & Related papers (2025-08-05T14:16:11Z)
MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z)
Q&C: When Quantization Meets Cache in Efficient Image Generation [24.783679431414686]
We find that the combination of quantization and cache mechanisms for Diffusion Transformers (DiTs) is not straightforward.<n>We propose a hybrid acceleration method by tackling the above challenges.<n>Our method has accelerated DiTs by 12.7x while preserving competitive generation capability.
arXiv Detail & Related papers (2025-03-04T11:19:02Z)
Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion [9.402892455344677]
We propose an efficient quantization framework for Stable Diffusion models (SDM)<n>Our framework simultaneously maintains training-inference consistency and ensures optimization stability.<n>Our method demonstrates superior performance over state-of-the-art approaches with shorter training times.
arXiv Detail & Related papers (2024-12-09T17:00:20Z)
2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution [83.09117439860607]
Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment. It is notorious that low-bit quantization degrades the accuracy of SR models compared to their full-precision (FP) counterparts. We present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization.
arXiv Detail & Related papers (2024-06-10T06:06:11Z)
EDA-DM: Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models [8.742501879586309]
Quantization can effectively reduce model complexity, and post-training quantization (PTQ) is highly promising for compressing and accelerating diffusion models.<n>Existing PTQ methods suffer from distribution mismatch issues at both calibration sample level and reconstruction output level.<n>We propose EDA-DM, a standardized PTQ method that efficiently addresses the above issues.
arXiv Detail & Related papers (2024-01-09T14:42:49Z)
CBQ: Cross-Block Quantization for Large Language Models [66.82132832702895]
Post-training quantization (PTQ) has played a key role in compressing large language models (LLMs) with ultra-low costs.<n>We propose CBQ, a cross-block reconstruction-based PTQ method for LLMs.<n> CBQ employs a cross-block dependency using a reconstruction scheme, establishing long-range dependencies across multiple blocks to minimize error accumulation.
arXiv Detail & Related papers (2023-12-13T07:56:27Z)
Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing [49.800746112114375]
We propose a novel post-training quantization method (Progressive and Relaxing) for text-to-image diffusion models. We are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
arXiv Detail & Related papers (2023-11-10T09:10:09Z)
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models [21.17675493267517]
Post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches to compress and accelerate diffusion models. We introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency. Our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency.
arXiv Detail & Related papers (2023-10-05T02:51:53Z)
Weight Re-Mapping for Variational Quantum Algorithms [54.854986762287126]
We introduce the concept of weight re-mapping for variational quantum circuits (VQCs) We employ seven distinct weight re-mapping functions to assess their impact on eight classification datasets. Our results indicate that weight re-mapping can enhance the convergence speed of the VQC.
arXiv Detail & Related papers (2023-06-09T09:42:21Z)
Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers [17.445202457319517]
Quantization-aware training (QAT) is a promising method to lower the implementation cost and energy consumption. This work proposes a proactive knowledge distillation method called Teacher Intervention (TI) for fast converging QAT of ultra-low precision pre-trained Transformers.
arXiv Detail & Related papers (2023-02-23T06:48:24Z)
CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification [51.81850995661478]
Mixed-precision quantization has been widely applied on deep neural networks (DNNs) Previous attempts on bit-level regularization and pruning-based dynamic precision adjustment during training suffer from noisy gradients and unstable convergence. We propose Continuous Sparsification Quantization (CSQ), a bit-level training method to search for mixed-precision quantization schemes with improved stability.
arXiv Detail & Related papers (2022-12-06T05:44:21Z)
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization [75.72231742114951]
Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks. These models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency. We propose to jointly distill and quantize the model, where knowledge is transferred from the full-precision teacher model to the quantized and distilled low-precision student model.
arXiv Detail & Related papers (2022-03-21T18:04:25Z)
Compression of Generative Pre-trained Language Models via Quantization [62.80110048377957]
We find that previous quantization methods fail on generative tasks due to the textithomogeneous word embeddings We propose a token-level contrastive distillation to learn distinguishable word embeddings, and a module-wise dynamic scaling to make quantizers adaptive to different modules.
arXiv Detail & Related papers (2022-03-21T02:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.