QuantNAS for super resolution: searching for efficient
quantization-friendly architectures against quantization noise
- URL: http://arxiv.org/abs/2208.14839v4
- Date: Wed, 10 Jan 2024 15:52:03 GMT
- Title: QuantNAS for super resolution: searching for efficient
quantization-friendly architectures against quantization noise
- Authors: Egor Shvetsov, Dmitry Osin, Alexey Zaytsev, Ivan Koryakovskiy,
Valentin Buchnev, Ilya Trofimov, Evgeny Burnaev
- Abstract summary: We propose a novel quantization-aware procedure, the QuantNAS.
We use entropy regularization, quantization noise, and Adaptive Deviation for Quantization (ADQ) module to enhance the search procedure.
The proposed procedure is 30% faster than direct weight quantization and is more stable.
- Score: 19.897685398009912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a constant need for high-performing and computationally efficient
neural network models for image super-resolution: computationally efficient
models can be used via low-capacity devices and reduce carbon footprints. One
way to obtain such models is to compress models, e.g. quantization. Another way
is a neural architecture search that automatically discovers new, more
efficient solutions. We propose a novel quantization-aware procedure, the
QuantNAS that combines pros of these two approaches. To make QuantNAS work, the
procedure looks for quantization-friendly super-resolution models. The approach
utilizes entropy regularization, quantization noise, and Adaptive Deviation for
Quantization (ADQ) module to enhance the search procedure. The entropy
regularization technique prioritizes a single operation within each block of
the search space. Adding quantization noise to parameters and activations
approximates model degradation after quantization, resulting in a more
quantization-friendly architectures. ADQ helps to alleviate problems caused by
Batch Norm blocks in super-resolution models. Our experimental results show
that the proposed approximations are better for search procedure than direct
model quantization. QuantNAS discovers architectures with better PSNR/BitOps
trade-off than uniform or mixed precision quantization of fixed architectures.
We showcase the effectiveness of our method through its application to two
search spaces inspired by the state-of-the-art SR models and RFDN. Thus, anyone
can design a proper search space based on an existing architecture and apply
our method to obtain better quality and efficiency.
The proposed procedure is 30\% faster than direct weight quantization and is
more stable.
Related papers
- Q-VLM: Post-training Quantization for Large Vision-Language Models [73.19871905102545]
We propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference.
We mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy.
Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLaVA model without performance degradation.
arXiv Detail & Related papers (2024-10-10T17:02:48Z) - Trainability maximization using estimation of distribution algorithms assisted by surrogate modelling for quantum architecture search [8.226785409557598]
Quantum architecture search (QAS) involves optimizing both the quantum parametric circuit configuration but also its parameters for a variational quantum algorithm.
In this paper, we aim to achieve two improvements in QAS: (1) to reduce the number of measurements by an online surrogate model of the evaluation process that aggressively discards architectures of poor performance; (2) to avoid training the circuits when BPs are present.
We experimentally validate our proposal for the variational quantum eigensolver and showcase that our algorithm is able to find solutions that have been previously proposed in the literature for the Hamiltonians; but also to outperform the state of the
arXiv Detail & Related papers (2024-07-29T15:22:39Z) - 2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution [83.09117439860607]
Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment.
It is notorious that low-bit quantization degrades the accuracy of SR models compared to their full-precision (FP) counterparts.
We present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization.
arXiv Detail & Related papers (2024-06-10T06:06:11Z) - Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models [88.80146574509195]
Quantization is a promising approach for reducing memory overhead and accelerating inference.
We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
arXiv Detail & Related papers (2023-10-20T07:09:56Z) - QuantEase: Optimization-based Quantization for Language Models [17.333778751252392]
This work introduces Quantization (PTQ) of various quantization layers from recent advances of Large Language Models (LLMs)
Our CD-based approach features straightforward updates, relying solely on vector operations.
We also explore an outlier approach, allowing for retaining significant weights (outoutliers) with complete precision.
arXiv Detail & Related papers (2023-09-05T01:39:09Z) - Effective and Fast: A Novel Sequential Single Path Search for
Mixed-Precision Quantization [45.22093693422085]
Mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance.
It is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints.
We propose a novel sequential single path search (SSPS) method for mixed-precision quantization.
arXiv Detail & Related papers (2021-03-04T09:15:08Z) - BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network
Quantization [32.770842274996774]
Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks.
Previous methods either examine only a small manually-designed search space or utilize a cumbersome neural architecture search to explore the vast search space.
This work proposes bit-level sparsity quantization (BSQ) to tackle the mixed-precision quantization from a new angle of inducing bit-level sparsity.
arXiv Detail & Related papers (2021-02-20T22:37:41Z) - Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy.
We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR.
Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z) - Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices.
We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT)
Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z) - VecQ: Minimal Loss DNN Model Compression With Vectorized Weight
Quantization [19.66522714831141]
We develop a new quantization solution called VecQ, which can guarantee minimal direct quantization loss and better model accuracy.
In addition, in order to up the proposed quantization process during training, we accelerate the quantization process with a parameterized estimation and probability-based calculation.
arXiv Detail & Related papers (2020-05-18T07:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.