Related papers: Gradient $\ell_1$ Regularization for Quantization Robustness

Gradient $\ell_1$ Regularization for Quantization Robustness

URL: http://arxiv.org/abs/2002.07520v1
Date: Tue, 18 Feb 2020 12:31:34 GMT
Title: Gradient $\ell_1$ Regularization for Quantization Robustness
Authors: Milad Alizadeh, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, Max Welling
Abstract summary: We derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
Score: 70.39776106458858
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application change. Unlike quantization-aware training using the straight-through estimator that only targets a specific bit-width and requires access to training data and pipeline, our regularization-based method paves the way for "on the fly'' post-training quantization to various bit-widths. We show that by modeling quantization as a $\ell_\infty$-bounded perturbation, the first-order term in the loss expansion can be regularized using the $\ell_1$-norm of gradients. We experimentally validate the effectiveness of our regularization scheme on different architectures on CIFAR-10 and ImageNet datasets.

Related papers

RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [53.571195477043496]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE) RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers. Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z)
Vertical Layering of Quantized Neural Networks for Heterogeneous Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z)
Symmetry Regularization and Saturating Nonlinearity for Robust Quantization [5.1779694507922835]
We present three insights to robustify a network against quantization. We propose two novel methods called symmetry regularization (SymReg) and saturating nonlinearity (SatNL) Applying the proposed methods during training can enhance the robustness of arbitrary neural networks against quantization.
arXiv Detail & Related papers (2022-07-31T02:12:28Z)
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation. Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z)
Standard Deviation-Based Quantization for Deep Neural Networks [17.495852096822894]
Quantization of deep neural networks is a promising approach that reduces the inference cost. We propose a new framework to learn the quantization intervals (discrete values) using the knowledge of the network's weight and activation distributions. Our scheme simultaneously prunes the network's parameters and allows us to flexibly adjust the pruning ratio during the quantization process.
arXiv Detail & Related papers (2022-02-24T23:33:47Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
In-Hindsight Quantization Range Estimation for Quantized Training [5.65658124285176]
We propose a simple alternative to dynamic quantization, in-hindsight range estimation, that uses the quantization ranges estimated on previous iterations to quantize the present. Our approach enables fast static quantization of gradients and activations while requiring only minimal hardware support from the neural network accelerator. It is intended as a drop-in replacement for estimating quantization ranges and can be used in conjunction with other advances in quantized training.
arXiv Detail & Related papers (2021-05-10T10:25:28Z)
One Model for All Quantization: A Quantized Network Supporting Hot-Swap Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths. We use wavelet decomposition and reconstruction to increase the diversity of weights. Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z)
Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights. Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z)
Recurrence of Optimum for Training Weight and Activation Quantized Networks [4.103701929881022]
Training deep learning models with low-precision weights and activations involves a demanding optimization task. We show how to overcome the nature of network quantization. We also show numerical evidence of the recurrence phenomenon of weight evolution in training quantized deep networks.
arXiv Detail & Related papers (2020-12-10T09:14:43Z)
Where Should We Begin? A Low-Level Exploration of Weight Initialization Impact on Quantized Behaviour of Deep Neural Networks [93.4221402881609]
We present an in-depth, fine-grained ablation study of the effect of different weights initialization on the final distributions of weights and activations of different CNN architectures. To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.
arXiv Detail & Related papers (2020-11-30T06:54:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.