Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection
- URL: http://arxiv.org/abs/2306.07215v3
- Date: Tue, 20 Aug 2024 16:37:12 GMT
- Title: Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection
- Authors: Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng,
- Abstract summary: Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations.
Most existing QAT methods require end-to-end training on the entire dataset.
We propose two metrics based on analysis of loss and gradient of quantized weights to quantify the importance of each sample during training.
- Score: 38.23587031169402
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long training time and high energy costs. In addition, the potential label noise in the training data undermines the robustness of QAT. We propose two metrics based on analysis of loss and gradient of quantized weights: error vector score and disagreement score, to quantify the importance of each sample during training. Guided by these two metrics, we proposed a quantization-aware Adaptive Coreset Selection (ACS) method to select the data for the current training epoch. We evaluate our method on various networks (ResNet-18, MobileNetV2, RetinaNet), datasets(CIFAR-10, CIFAR-100, ImageNet-1K, COCO), and under different quantization settings. Specifically, our method can achieve an accuracy of 68.39\% of 4-bit quantized ResNet-18 on the ImageNet-1K dataset with only a 10\% subset, which has an absolute gain of 4.24\% compared to the baseline. Our method can also improve the robustness of QAT by removing noisy samples in the training set.
Related papers
- AdaQAT: Adaptive Bit-Width Quantization-Aware Training [0.873811641236639]
Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios.
Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging.
We present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimize bit-widths during training for more efficient inference.
arXiv Detail & Related papers (2024-04-22T09:23:56Z) - Optimal Clipping and Magnitude-aware Differentiation for Improved
Quantization-aware Training [8.106641866299377]
Current practices rely on scalars to set clipping threshold scalars and cannot be shown to be optimal.
We propose Optimally Clippeds And Vectors ( OCTAV), a algorithm to determine MSE-optimal clipping scalars.
OCTAV finds optimal clipping scalars on the fly, for every tensor, at every iteration of the quantization-aware training (QAT) routine.
arXiv Detail & Related papers (2022-06-13T22:15:21Z) - BMPQ: Bit-Gradient Sensitivity Driven Mixed-Precision Quantization of
DNNs from Scratch [11.32458063021286]
This paper presents BMPQ, a training method that uses bit gradients to analyze layer sensitivities and yield mixed-precision quantized models.
It requires a single training iteration but does not need a pre-trained baseline.
Compared to the baseline FP-32 models, BMPQ can yield models that have 15.4x fewer parameter bits with a negligible drop in accuracy.
arXiv Detail & Related papers (2021-12-24T03:16:58Z) - Jigsaw Clustering for Unsupervised Visual Representation Learning [68.09280490213399]
We propose a new jigsaw clustering pretext task in this paper.
Our method makes use of information from both intra- and inter-images.
It is even comparable to the contrastive learning methods when only half of training batches are used.
arXiv Detail & Related papers (2021-04-01T08:09:26Z) - Activation Density based Mixed-Precision Quantization for Energy
Efficient Neural Networks [2.666640112616559]
We propose an in-training quantization method for neural network models.
Our method calculates bit-width for each layer during training a mixed precision model with competitive accuracy.
We run experiments on benchmark datasets like CIFAR-10, CIFAR-100, TinyImagenet on VGG19/ResNet18 architectures.
arXiv Detail & Related papers (2021-01-12T09:01:44Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Weight Update Skipping: Reducing Training Time for Artificial Neural
Networks [0.30458514384586394]
We propose a new training methodology for ANNs that exploits the observation of improvement of accuracy shows temporal variations.
During such time windows, we keep updating bias which ensures the network still trains and avoids overfitting.
Such a training approach virtually achieves the same accuracy with considerably less computational cost, thus lower training time.
arXiv Detail & Related papers (2020-12-05T15:12:10Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z) - Dynamic R-CNN: Towards High Quality Object Detection via Dynamic
Training [70.2914594796002]
We propose Dynamic R-CNN to adjust the label assignment criteria and the shape of regression loss function.
Our method improves upon ResNet-50-FPN baseline with 1.9% AP and 5.5% AP$_90$ on the MS dataset with no extra overhead.
arXiv Detail & Related papers (2020-04-13T15:20:25Z) - Filter Sketch for Network Pruning [184.41079868885265]
We propose a novel network pruning approach by information preserving of pre-trained network weights (filters)
Our approach, referred to as FilterSketch, encodes the second-order information of pre-trained weights.
Experiments on CIFAR-10 show that FilterSketch reduces 63.3% of FLOPs and prunes 59.9% of network parameters with negligible accuracy cost.
arXiv Detail & Related papers (2020-01-23T13:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.