QuEST: Low-bit Diffusion Model Quantization via Efficient Selective
Finetuning
- URL: http://arxiv.org/abs/2402.03666v2
- Date: Tue, 13 Feb 2024 05:22:34 GMT
- Title: QuEST: Low-bit Diffusion Model Quantization via Efficient Selective
Finetuning
- Authors: Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Yan Yan
- Abstract summary: Diffusion models have achieved remarkable success in image generation tasks, yet their practical deployment is restrained by the high memory and time consumption.
We propose finetuning the quantized model to better adapt to the activation distribution.
Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings.
- Score: 14.295049174485902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have achieved remarkable success in image generation tasks,
yet their practical deployment is restrained by the high memory and time
consumption. While quantization paves a way for diffusion model compression and
acceleration, existing methods totally fail when the models are quantized to
low-bits. In this paper, we unravel three properties in quantized diffusion
models that compromise the efficacy of current methods: imbalanced activation
distributions, imprecise temporal information, and vulnerability to
perturbations of specific modules. To alleviate the intensified low-bit
quantization difficulty stemming from the distribution imbalance, we propose
finetuning the quantized model to better adapt to the activation distribution.
Building on this idea, we identify two critical types of quantized layers:
those holding vital temporal information and those sensitive to reduced
bit-width, and finetune them to mitigate performance degradation with
efficiency. We empirically verify that our approach modifies the activation
distribution and provides meaningful temporal information, facilitating easier
and more accurate quantization. Our method is evaluated over three
high-resolution image generation tasks and achieves state-of-the-art
performance under various bit-width settings, as well as being the first method
to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is
been made publicly available.
Related papers
- Timestep-Aware Correction for Quantized Diffusion Models [28.265582848911574]
We propose a timestep-aware correction method for quantized diffusion model, which dynamically corrects the quantization error.
By leveraging the proposed method in low-precision diffusion models, substantial enhancement of output quality could be achieved with only negligible overhead.
arXiv Detail & Related papers (2024-07-04T13:22:31Z) - Lossy Image Compression with Foundation Diffusion Models [10.407650300093923]
In this work we formulate the removal of quantization error as a denoising task, using diffusion to recover lost information in the transmitted image latent.
Our approach allows us to perform less than 10% of the full diffusion generative process and requires no architectural changes to the backbone.
arXiv Detail & Related papers (2024-04-12T16:23:42Z) - Memory-Efficient Fine-Tuning for Quantized Diffusion Model [12.875837358532422]
We introduce TuneQDM, a memory-efficient fine-tuning method for quantized diffusion models.
Our method consistently outperforms the baseline in both single-/multi-subject generations.
arXiv Detail & Related papers (2024-01-09T03:42:08Z) - Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing [49.800746112114375]
We propose a novel post-training quantization method (Progressive and Relaxing) for text-to-image diffusion models.
We are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
arXiv Detail & Related papers (2023-11-10T09:10:09Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Towards Accurate Post-training Quantization for Diffusion Models [73.19871905102545]
We propose an accurate data-free post-training quantization framework of diffusion models (ADP-DM) for efficient image generation.
Our method outperforms the state-of-the-art post-training quantization of diffusion model by a sizable margin with similar computational cost.
arXiv Detail & Related papers (2023-05-30T04:00:35Z) - Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks.
We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture.
We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution
Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs.
Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance.
We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.