Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware
Training
- URL: http://arxiv.org/abs/2211.08544v4
- Date: Tue, 25 Jul 2023 22:42:36 GMT
- Title: Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware
Training
- Authors: Yunshan Zhong, Gongrui Nan, Yuxin Zhang, Fei Chao, Rongrong Ji
- Abstract summary: A large portion of quantized weights reaches the optimal quantization level after a few training epochs, which we refer to as the partly scratch-off lottery ticket.
We develop a method, dubbed lottery ticket scratcher (LTS), which freezes a weight once the distance between the full-precision one and its quantization level is smaller than a controllable threshold.
- Score: 69.8539756804198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantization-aware training (QAT) receives extensive popularity as it well
retains the performance of quantized networks. In QAT, the contemporary
experience is that all quantized weights are updated for an entire training
process. In this paper, this experience is challenged based on an interesting
phenomenon we observed. Specifically, a large portion of quantized weights
reaches the optimal quantization level after a few training epochs, which we
refer to as the partly scratch-off lottery ticket. This
straightforward-yet-valuable observation naturally inspires us to zero out
gradient calculations of these weights in the remaining training period to
avoid meaningless updating. To effectively find the ticket, we develop a
heuristic method, dubbed lottery ticket scratcher (LTS), which freezes a weight
once the distance between the full-precision one and its quantization level is
smaller than a controllable threshold. Surprisingly, the proposed LTS typically
eliminates 50%-70% weight updating and 25%-35% FLOPs of the backward pass,
while still resulting on par with or even better performance than the compared
baseline. For example, compared with the baseline, LTS improves 2-bit
MobileNetV2 by 5.05%, eliminating 46% weight updating and 23% FLOPs of the
backward pass. Code is at url{https://github.com/zysxmu/LTS}.
Related papers
- EfQAT: An Efficient Framework for Quantization-Aware Training [20.47826378511535]
Quantization-aware training (QAT) schemes have been shown to achieve near-full precision accuracy.
Post-training quantization (PTQ) schemes do not involve training and are therefore computationally cheap.
We propose EfQAT, which generalizes both schemes by optimizing only a subset of the parameters of a quantized model.
arXiv Detail & Related papers (2024-11-17T11:06:36Z) - FlatQuant: Flatness Matters for LLM Quantization [58.28221892035609]
We propose FlatQuant, a new post-training quantization approach to enhance flatness of weights and activations.
Our approach identifies optimal affine transformations tailored to each linear layer, calibrated in hours via a lightweight objective runtime.
For inference latency, FlatQuant reduces the slowdown induced by pre-quantization transformation from 0.26x of QuaRot to merely $textbf0.07x$, bringing up to $textbf2.3x$ speedup for prefill and $textbf1.7x$ speedup for decoding.
arXiv Detail & Related papers (2024-10-12T08:10:28Z) - Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction [48.740630807085566]
Post-training quantization (PTQ) for vision transformers (ViTs) has received increasing attention from both academic and industrial communities.
Current methods fail to account for the complex interactions between quantized weights and activations, resulting in significant quantization errors and suboptimal performance.
This paper presents ERQ, an innovative two-step PTQ method specifically crafted to reduce quantization errors arising from activation and weight quantization sequentially.
arXiv Detail & Related papers (2024-07-09T12:06:03Z) - Transition Rate Scheduling for Quantization-Aware Training [26.792400685888175]
Quantization-aware training (QAT) simulates a quantization process during training to lower bit-precision of weights/activations.
It learns quantized weights indirectly by updating latent weights, using gradient-baseds.
We introduce a transition rate (TR) scheduling technique that controls the number of transitions of quantized weights explicitly.
arXiv Detail & Related papers (2024-04-30T04:12:36Z) - Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training.
Existing training algorithms do not exploit the low-rank property to improve computational efficiency.
We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z) - Improving Convergence for Quantum Variational Classifiers using Weight
Re-Mapping [60.086820254217336]
In recent years, quantum machine learning has seen a substantial increase in the use of variational quantum circuits (VQCs)
We introduce weight re-mapping for VQCs, to unambiguously map the weights to an interval of length $2pi$.
We demonstrate that weight re-mapping increased test accuracy for the Wine dataset by $10%$ over using unmodified weights.
arXiv Detail & Related papers (2022-12-22T13:23:19Z) - Overcoming Oscillations in Quantization-Aware Training [18.28657022169428]
When training neural networks with simulated quantization, quantized weights can, rather unexpectedly, oscillate between two grid-points.
We show that it can lead to a significant accuracy degradation due to wrongly estimated batch-normalization statistics.
We propose two novel QAT algorithms to overcome oscillations during training: oscillation dampening and iterative weight freezing.
arXiv Detail & Related papers (2022-03-21T16:07:42Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Weight Update Skipping: Reducing Training Time for Artificial Neural
Networks [0.30458514384586394]
We propose a new training methodology for ANNs that exploits the observation of improvement of accuracy shows temporal variations.
During such time windows, we keep updating bias which ensures the network still trains and avoids overfitting.
Such a training approach virtually achieves the same accuracy with considerably less computational cost, thus lower training time.
arXiv Detail & Related papers (2020-12-05T15:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.