QuantNAS for super resolution: searching for efficient
quantization-friendly architectures against quantization noise
- URL: http://arxiv.org/abs/2208.14839v4
- Date: Wed, 10 Jan 2024 15:52:03 GMT
- Title: QuantNAS for super resolution: searching for efficient
quantization-friendly architectures against quantization noise
- Authors: Egor Shvetsov, Dmitry Osin, Alexey Zaytsev, Ivan Koryakovskiy,
Valentin Buchnev, Ilya Trofimov, Evgeny Burnaev
- Abstract summary: We propose a novel quantization-aware procedure, the QuantNAS.
We use entropy regularization, quantization noise, and Adaptive Deviation for Quantization (ADQ) module to enhance the search procedure.
The proposed procedure is 30% faster than direct weight quantization and is more stable.
- Score: 19.897685398009912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a constant need for high-performing and computationally efficient
neural network models for image super-resolution: computationally efficient
models can be used via low-capacity devices and reduce carbon footprints. One
way to obtain such models is to compress models, e.g. quantization. Another way
is a neural architecture search that automatically discovers new, more
efficient solutions. We propose a novel quantization-aware procedure, the
QuantNAS that combines pros of these two approaches. To make QuantNAS work, the
procedure looks for quantization-friendly super-resolution models. The approach
utilizes entropy regularization, quantization noise, and Adaptive Deviation for
Quantization (ADQ) module to enhance the search procedure. The entropy
regularization technique prioritizes a single operation within each block of
the search space. Adding quantization noise to parameters and activations
approximates model degradation after quantization, resulting in a more
quantization-friendly architectures. ADQ helps to alleviate problems caused by
Batch Norm blocks in super-resolution models. Our experimental results show
that the proposed approximations are better for search procedure than direct
model quantization. QuantNAS discovers architectures with better PSNR/BitOps
trade-off than uniform or mixed precision quantization of fixed architectures.
We showcase the effectiveness of our method through its application to two
search spaces inspired by the state-of-the-art SR models and RFDN. Thus, anyone
can design a proper search space based on an existing architecture and apply
our method to obtain better quality and efficiency.
The proposed procedure is 30\% faster than direct weight quantization and is
more stable.
Related papers
- 2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution [83.09117439860607]
Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment.
It is notorious that low-bit quantization degrades the accuracy of SR models compared to their full-precision (FP) counterparts.
We present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization.
arXiv Detail & Related papers (2024-06-10T06:06:11Z) - MGAS: Multi-Granularity Architecture Search for Trade-Off Between Model
Effectiveness and Efficiency [10.641875933652647]
We introduce multi-granularity architecture search (MGAS) to discover both effective and efficient neural networks.
We learn discretization functions specific to each granularity level to adaptively determine the unit remaining ratio according to the evolving architecture.
Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet demonstrate that MGAS outperforms other state-of-the-art methods in achieving a better trade-off between model performance and model size.
arXiv Detail & Related papers (2023-10-23T16:32:18Z) - Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models [88.80146574509195]
Quantization is a promising approach for reducing memory overhead and accelerating inference.
We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
arXiv Detail & Related papers (2023-10-20T07:09:56Z) - QuantEase: Optimization-based Quantization for Language Models [17.333778751252392]
This work introduces Quantization (PTQ) of various quantization layers from recent advances of Large Language Models (LLMs)
Our CD-based approach features straightforward updates, relying solely on vector operations.
We also explore an outlier approach, allowing for retaining significant weights (outoutliers) with complete precision.
arXiv Detail & Related papers (2023-09-05T01:39:09Z) - Effective and Fast: A Novel Sequential Single Path Search for
Mixed-Precision Quantization [45.22093693422085]
Mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance.
It is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints.
We propose a novel sequential single path search (SSPS) method for mixed-precision quantization.
arXiv Detail & Related papers (2021-03-04T09:15:08Z) - BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network
Quantization [32.770842274996774]
Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks.
Previous methods either examine only a small manually-designed search space or utilize a cumbersome neural architecture search to explore the vast search space.
This work proposes bit-level sparsity quantization (BSQ) to tackle the mixed-precision quantization from a new angle of inducing bit-level sparsity.
arXiv Detail & Related papers (2021-02-20T22:37:41Z) - Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy.
We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR.
Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z) - Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices.
We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT)
Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z) - VecQ: Minimal Loss DNN Model Compression With Vectorized Weight
Quantization [19.66522714831141]
We develop a new quantization solution called VecQ, which can guarantee minimal direct quantization loss and better model accuracy.
In addition, in order to up the proposed quantization process during training, we accelerate the quantization process with a parameterized estimation and probability-based calculation.
arXiv Detail & Related papers (2020-05-18T07:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.