QuantFace: Low-Bit Post-Training Quantization for One-Step Diffusion Face Restoration
- URL: http://arxiv.org/abs/2506.00820v1
- Date: Sun, 01 Jun 2025 03:52:59 GMT
- Title: QuantFace: Low-Bit Post-Training Quantization for One-Step Diffusion Face Restoration
- Authors: Jiatong Li, Libo Zhu, Haotong Qin, Jingkai Wang, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang,
- Abstract summary: Diffusion models have been achieving remarkable performance in face restoration.<n>The heavy computations of diffusion models make it difficult to deploy them on devices like smartphones.<n>We propose QuantFace, a novel low-bit quantization for one-step diffusion face restoration models.
- Score: 109.89807858620242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have been achieving remarkable performance in face restoration. However, the heavy computations of diffusion models make it difficult to deploy them on devices like smartphones. In this work, we propose QuantFace, a novel low-bit quantization for one-step diffusion face restoration models, where the full-precision (\ie, 32-bit) weights and activations are quantized to 4$\sim$6-bit. We first analyze the data distribution within activations and find that they are highly variant. To preserve the original data information, we employ rotation-scaling channel balancing. Furthermore, we propose Quantization-Distillation Low-Rank Adaptation (QD-LoRA) that jointly optimizes for quantization and distillation performance. Finally, we propose an adaptive bit-width allocation strategy. We formulate such a strategy as an integer programming problem, which combines quantization error and perceptual metrics to find a satisfactory resource allocation. Extensive experiments on the synthetic and real-world datasets demonstrate the effectiveness of QuantFace under 6-bit and 4-bit. QuantFace achieves significant advantages over recent leading low-bit quantization methods for face restoration. The code is available at https://github.com/jiatongli2024/QuantFace.
Related papers
- MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z) - Low-bit Model Quantization for Deep Neural Networks: A Survey [123.89598730307208]
This article surveys the recent five-year progress towards low-bit quantization on deep neural networks (DNNs)<n>We discuss and compare the state-of-the-art quantization methods and classify them into 8 main categories and 24 sub-categories according to their core techniques.<n>We shed light on the potential research opportunities in the field of model quantization.
arXiv Detail & Related papers (2025-05-08T13:26:19Z) - ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals [10.860081994662645]
Post-training quantization of large language models (LLMs) holds the promise in reducing the prohibitive computational cost at inference time.<n>We propose ResQ, a PTQ method that pushes further the state-of-the-art.<n>We demonstrate that ResQ outperforms recent uniform and mixed precision PTQ methods on a variety of benchmarks.
arXiv Detail & Related papers (2024-12-18T22:01:55Z) - MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models [37.061975191553]
This paper presents MPQ-DM, a Mixed-Precision Quantization method for Diffusion Models.<n>To mitigate the quantization error caused by outlier severe weight channels, we propose an Outlier-Driven Mixed Quantization technique.<n>To robustly learn representations crossing time steps, we construct a Time-Smoothed Relation Distillation scheme.
arXiv Detail & Related papers (2024-12-16T08:31:55Z) - SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models [58.5019443418822]
Diffusion models can generate high-quality images, but as they scale, rising memory demands and higher latency pose deployment challenges.<n>We propose SVDQuant, a new 4-bit quantization paradigm to overcome this limitation.<n>We reduce the memory usage for the 12B FLUX.1 models by 3.5$times$, achieving 3.0$times$ speedup over the 4-bit weight-only quantization (W4A16) baseline.
arXiv Detail & Related papers (2024-11-07T18:59:58Z) - P4Q: Learning to Prompt for Quantization in Visual-language Models [38.87018242616165]
We propose a method that balances fine-tuning and quantization named Prompt for Quantization'' (P4Q)
Our method can effectively reduce the gap between image features and text features caused by low-bit quantization.
Our 8-bit P4Q can theoretically compress the CLIP-ViT/B-32 by 4 $times$ while achieving 66.94% Top-1 accuracy.
arXiv Detail & Related papers (2024-09-26T08:31:27Z) - 2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution [83.09117439860607]
Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment.
It is notorious that low-bit quantization degrades the accuracy of SR models compared to their full-precision (FP) counterparts.
We present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization.
arXiv Detail & Related papers (2024-06-10T06:06:11Z) - QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning [52.157939524815866]
In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods.
We identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width.
Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings.
arXiv Detail & Related papers (2024-02-06T03:39:44Z) - Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models [21.17675493267517]
Post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches to compress and accelerate diffusion models.
We introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.
Our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency.
arXiv Detail & Related papers (2023-10-05T02:51:53Z) - Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech
Recognition [65.7040645560855]
We propose Q-ASR, an integer-only, zero-shot quantization scheme for ASR models.
We show negligible WER change as compared to the full-precision baseline models.
Q-ASR exhibits a large compression rate of more than 4x with small WER degradation.
arXiv Detail & Related papers (2021-03-31T06:05:40Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.