Error-aware Quantization through Noise Tempering
- URL: http://arxiv.org/abs/2212.05603v1
- Date: Sun, 11 Dec 2022 20:37:50 GMT
- Title: Error-aware Quantization through Noise Tempering
- Authors: Zheng Wang, Juncheng B Li, Shuhui Qu, Florian Metze, Emma Strubell
- Abstract summary: Quantization-aware training (QAT) optimize model parameters with respect to the end task while simulating quantization error.
In this work, we incorporate exponentially decaying quantization-error-aware noise together with a learnable scale of task loss gradient to approximate the effect of a quantization operator.
Our method obtains state-of-the-art top-1 classification accuracy for uniform (non mixed-precision) quantization, out-performing previous methods by 0.5-1.2% absolute.
- Score: 43.049102196902844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantization has become a predominant approach for model compression,
enabling deployment of large models trained on GPUs onto smaller form-factor
devices for inference. Quantization-aware training (QAT) optimizes model
parameters with respect to the end task while simulating quantization error,
leading to better performance than post-training quantization. Approximation of
gradients through the non-differentiable quantization operator is typically
achieved using the straight-through estimator (STE) or additive noise. However,
STE-based methods suffer from instability due to biased gradients, whereas
existing noise-based methods cannot reduce the resulting variance. In this
work, we incorporate exponentially decaying quantization-error-aware noise
together with a learnable scale of task loss gradient to approximate the effect
of a quantization operator. We show this method combines gradient scale and
quantization noise in a better optimized way, providing finer-grained
estimation of gradients at each weight and activation layer's quantizer bin
size. Our controlled noise also contains an implicit curvature term that could
encourage flatter minima, which we show is indeed the case in our experiments.
Experiments training ResNet architectures on the CIFAR-10, CIFAR-100 and
ImageNet benchmarks show that our method obtains state-of-the-art top-1
classification accuracy for uniform (non mixed-precision) quantization,
out-performing previous methods by 0.5-1.2% absolute.
Related papers
- PTQD: Accurate Post-Training Quantization for Diffusion Models [22.567863065523902]
Post-training quantization of diffusion models can significantly reduce the model size and accelerate the sampling process without re-training.
Applying existing PTQ methods directly to low-bit diffusion models can significantly impair the quality of generated samples.
We propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process.
arXiv Detail & Related papers (2023-05-18T02:28:42Z) - SQuAT: Sharpness- and Quantization-Aware Training for BERT [43.049102196902844]
We propose sharpness- and quantization-aware training (SQuAT)
Our method can consistently outperform state-of-the-art quantized BERT models under 2, 3, and 4-bit settings by 1%.
Our experiments on empirical measurement of sharpness also suggest that our method would lead to flatter minima compared to other quantization methods.
arXiv Detail & Related papers (2022-10-13T16:52:19Z) - Mixed-Precision Inference Quantization: Radically Towards Faster
inference speed, Lower Storage requirement, and Lower Loss [4.877532217193618]
Existing quantization techniques rely heavily on experience and "fine-tuning" skills.
This study provides a methodology for acquiring a mixed-precise quantization model with a lower loss than the full precision model.
In particular, we will demonstrate that neural networks with massive identity mappings are resistant to the quantization method.
arXiv Detail & Related papers (2022-07-20T10:55:34Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Quasiprobability decompositions with reduced sampling overhead [4.38301148531795]
Quantum error mitigation techniques can reduce noise on current quantum hardware without the need for fault-tolerant quantum error correction.
We present a new algorithm based on mathematical optimization that aims to choose the quasiprobability decomposition in a noise-aware manner.
arXiv Detail & Related papers (2021-01-22T19:00:06Z) - DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution
Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs.
Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance.
We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z) - QuantNet: Learning to Quantize by Learning within Fully Differentiable
Framework [32.465949985191635]
This paper proposes a meta-based quantizer named QuantNet, which utilizes a differentiable sub-network to directly binarize the full-precision weights.
Our method not only solves the problem of gradient mismatching, but also reduces the impact of discretization errors, caused by the binarizing operation in the deployment.
arXiv Detail & Related papers (2020-09-10T01:41:05Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Training with Quantization Noise for Extreme Model Compression [57.51832088938618]
We tackle the problem of producing compact models, maximizing their accuracy for a given model size.
A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.
In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods.
arXiv Detail & Related papers (2020-04-15T20:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.