Related papers: Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion

Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion

URL: http://arxiv.org/abs/2412.06661v1
Date: Mon, 09 Dec 2024 17:00:20 GMT
Title: Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion
Authors: Shuaiting Li, Juncan Deng, Zeyu Wang, Hong Gu, Kedong Xu, Haibin Shen, Kejie Huang,
Abstract summary: We propose an efficient quantization framework for Stable Diffusion models.<n>Our approach features a Serial-to-Parallel calibration pipeline that addresses the consistency of both the calibration and inference processes.<n>Under W4A8 quantization settings, our approach enhances both distribution similarity and visual similarity by 45%-60%.
Score: 9.8078769718432
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image generation of Stable Diffusion models has achieved notable success due to its remarkable generation ability. However, the repetitive denoising process is computationally intensive during inference, which renders Diffusion models less suitable for real-world applications that require low latency and scalability. Recent studies have employed post-training quantization (PTQ) and quantization-aware training (QAT) methods to compress Diffusion models. Nevertheless, prior research has often neglected to examine the consistency between results generated by quantized models and those from floating-point models. This consistency is crucial in fields such as content creation, design, and edge deployment, as it can significantly enhance both efficiency and system stability for practitioners. To ensure that quantized models generate high-quality and consistent images, we propose an efficient quantization framework for Stable Diffusion models. Our approach features a Serial-to-Parallel calibration pipeline that addresses the consistency of both the calibration and inference processes, as well as ensuring training stability. Based on this pipeline, we further introduce a mix-precision quantization strategy, multi-timestep activation quantization, and time information precalculation techniques to ensure high-fidelity generation in comparison to floating-point models. Through extensive experiments with Stable Diffusion v1-4, v2-1, and XL 1.0, we have demonstrated that our method outperforms the current state-of-the-art techniques when tested on prompts from the COCO validation dataset and the Stable-Diffusion-Prompts dataset. Under W4A8 quantization settings, our approach enhances both distribution similarity and visual similarity by 45%-60%.

Related papers

MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z)
PQD: Post-training Quantization for Efficient Diffusion Models [4.809939957401427]
We propose a novel post-training quantization for diffusion models (PQD) We show that our proposed method is able to directly quantize full-precision diffusion models into 8-bit or 4-bit models while maintaining comparable performance in a training-free manner.
arXiv Detail & Related papers (2024-12-30T19:55:59Z)
TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models [49.65286242048452]
We propose a novel method dubbed Timestep-Channel Adaptive Quantization for Diffusion Models (TCAQ-DM) The proposed method substantially outperforms the state-of-the-art approaches in most cases.
arXiv Detail & Related papers (2024-12-21T16:57:54Z)
MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models [37.061975191553]
This paper presents MPQ-DM, a Mixed-Precision Quantization method for Diffusion Models. To mitigate the quantization error caused by outlier severe weight channels, we propose an Outlier-Driven Mixed Quantization technique. To robustly learn representations crossing time steps, we construct a Time-Smoothed Relation Distillation scheme.
arXiv Detail & Related papers (2024-12-16T08:31:55Z)
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution [95.98801201266099]
Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps.<n>We propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR.<n>Our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR.
arXiv Detail & Related papers (2024-11-26T04:49:42Z)
Provable Statistical Rates for Consistency Diffusion Models [87.28777947976573]
Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved. This paper contributes towards the first statistical theory for consistency models, formulating their training as a distribution discrepancy minimization problem.
arXiv Detail & Related papers (2024-06-23T20:34:18Z)
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning [52.157939524815866]
In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods. We identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width. Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings.
arXiv Detail & Related papers (2024-02-06T03:39:44Z)
Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing [49.800746112114375]
We propose a novel post-training quantization method (Progressive and Relaxing) for text-to-image diffusion models. We are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
arXiv Detail & Related papers (2023-11-10T09:10:09Z)
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models [21.17675493267517]
Post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches to compress and accelerate diffusion models. We introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency. Our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency.
arXiv Detail & Related papers (2023-10-05T02:51:53Z)
Temporal Dynamic Quantization for Diffusion Models [18.184163233551292]
We introduce a novel quantization method that dynamically adjusts the quantization interval based on time step information. Unlike conventional dynamic quantization techniques, our approach has no computational overhead during inference. Our experiments demonstrate substantial improvements in output quality with the quantized diffusion model across various datasets.
arXiv Detail & Related papers (2023-06-04T09:49:43Z)
Towards Accurate Post-training Quantization for Diffusion Models [73.19871905102545]
We propose an accurate data-free post-training quantization framework of diffusion models (ADP-DM) for efficient image generation. Our method outperforms the state-of-the-art post-training quantization of diffusion model by a sizable margin with similar computational cost.
arXiv Detail & Related papers (2023-05-30T04:00:35Z)
Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks. We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture. We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.