A High-Performance Adaptive Quantization Approach for Edge CNN
Applications
- URL: http://arxiv.org/abs/2107.08382v1
- Date: Sun, 18 Jul 2021 07:49:18 GMT
- Title: A High-Performance Adaptive Quantization Approach for Edge CNN
Applications
- Authors: Hsu-Hsun Chin, Ren-Song Tsay, Hsin-I Wu
- Abstract summary: Recent convolutional neural network (CNN) development continues to advance the state-of-the-art model accuracy for various applications.
The enhanced accuracy comes at the cost of substantial memory bandwidth and storage requirements.
In this paper, we introduce an adaptive high-performance quantization method to resolve the issue of biased activation.
- Score: 0.225596179391365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent convolutional neural network (CNN) development continues to advance
the state-of-the-art model accuracy for various applications. However, the
enhanced accuracy comes at the cost of substantial memory bandwidth and storage
requirements and demanding computational resources. Although in the past the
quantization methods have effectively reduced the deployment cost for edge
devices, it suffers from significant information loss when processing the
biased activations of contemporary CNNs. In this paper, we hence introduce an
adaptive high-performance quantization method to resolve the issue of biased
activation by dynamically adjusting the scaling and shifting factors based on
the task loss. Our proposed method has been extensively evaluated on image
classification models (ResNet-18/34/50, MobileNet-V2, EfficientNet-B0) with
ImageNet dataset, object detection model (YOLO-V4) with COCO dataset, and
language models with PTB dataset. The results show that our 4-bit integer
(INT4) quantization models achieve better accuracy than the state-of-the-art
4-bit models, and in some cases, even surpass the golden full-precision models.
The final designs have been successfully deployed onto extremely
resource-constrained edge devices for many practical applications.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - AdaQAT: Adaptive Bit-Width Quantization-Aware Training [0.873811641236639]
Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios.
Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging.
We present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimize bit-widths during training for more efficient inference.
arXiv Detail & Related papers (2024-04-22T09:23:56Z) - Post-training Model Quantization Using GANs for Synthetic Data
Generation [57.40733249681334]
We investigate the use of synthetic data as a substitute for the calibration with real data for the quantization method.
We compare the performance of models quantized using data generated by StyleGAN2-ADA and our pre-trained DiStyleGAN, with quantization using real data and an alternative data generation method based on fractal images.
arXiv Detail & Related papers (2023-05-10T11:10:09Z) - Edge Inference with Fully Differentiable Quantized Mixed Precision
Neural Networks [1.131071436917293]
Quantizing parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference.
This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing.
arXiv Detail & Related papers (2022-06-15T18:11:37Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - Mitigating severe over-parameterization in deep convolutional neural
networks through forced feature abstraction and compression with an
entropy-based heuristic [7.503338065129185]
We propose an Entropy-Based Convolutional Layer Estimation (EBCLE) which is robust and simple.
We present empirical evidence to emphasize the relative effectiveness of broader, yet shallower models trained using the EBCLE.
arXiv Detail & Related papers (2021-06-27T10:34:39Z) - Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech
Recognition [65.7040645560855]
We propose Q-ASR, an integer-only, zero-shot quantization scheme for ASR models.
We show negligible WER change as compared to the full-precision baseline models.
Q-ASR exhibits a large compression rate of more than 4x with small WER degradation.
arXiv Detail & Related papers (2021-03-31T06:05:40Z) - Activation Density based Mixed-Precision Quantization for Energy
Efficient Neural Networks [2.666640112616559]
We propose an in-training quantization method for neural network models.
Our method calculates bit-width for each layer during training a mixed precision model with competitive accuracy.
We run experiments on benchmark datasets like CIFAR-10, CIFAR-100, TinyImagenet on VGG19/ResNet18 architectures.
arXiv Detail & Related papers (2021-01-12T09:01:44Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.