ILMPQ : An Intra-Layer Multi-Precision Deep Neural Network Quantization
framework for FPGA
- URL: http://arxiv.org/abs/2111.00155v1
- Date: Sat, 30 Oct 2021 03:02:52 GMT
- Title: ILMPQ : An Intra-Layer Multi-Precision Deep Neural Network Quantization
framework for FPGA
- Authors: Sung-En Chang, Yanyu Li, Mengshu Sun, Yanzhi Wang, Xue Lin
- Abstract summary: This work targets the commonly used FPGA (field-programmable gate array) devices as the hardware platform for DNN edge computing.
We use a quantization method that supports multiple precisions along the intra-layer dimension.
We achieve 3.65x speedup in end-to-end inference time on the ImageNet, compared with the fixed-point quantization method.
- Score: 37.780528948703406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work targets the commonly used FPGA (field-programmable gate array)
devices as the hardware platform for DNN edge computing. We focus on DNN
quantization as the main model compression technique. The novelty of this work
is: We use a quantization method that supports multiple precisions along the
intra-layer dimension, while the existing quantization methods apply
multi-precision quantization along the inter-layer dimension. The intra-layer
multi-precision method can uniform the hardware configurations for different
layers to reduce computation overhead and at the same time preserve the model
accuracy as the inter-layer approach. Our proposed ILMPQ DNN quantization
framework achieves 70.73 Top1 accuracy in ResNet-18 on the ImageNet dataset. We
also validate the proposed MSP framework on two FPGA devices i.e., Xilinx
XC7Z020 and XC7Z045. We achieve 3.65x speedup in end-to-end inference time on
the ImageNet, compared with the fixed-point quantization method.
Related papers
- Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices.
For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator.
For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z) - End-to-end codesign of Hessian-aware quantized neural networks for FPGAs
and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs)
This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow.
We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC)
We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z) - A Comprehensive Survey on Model Quantization for Deep Neural Networks in
Image Classification [0.0]
A promising approach is quantization, in which the full-precision values are stored in low bit-width precision.
We present a comprehensive survey of quantization concepts and methods, with a focus on image classification.
We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization.
arXiv Detail & Related papers (2022-05-14T15:08:32Z) - RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise
Mixed Schemes and Multiple Precisions [43.27226390407956]
This work proposes a novel Deep Neural Network (DNN) quantization framework, namely RMSMP, with a Row-wise Mixed-Scheme and Multi-Precision approach.
The proposed RMSMP is tested for the image classification and natural language processing (BERT) applications.
It achieves the best accuracy performance among state-of-the-arts under the same equivalent precisions.
arXiv Detail & Related papers (2021-10-30T02:53:35Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z) - One Model for All Quantization: A Quantized Network Supporting Hot-Swap
Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths.
We use wavelet decomposition and reconstruction to increase the diversity of weights.
Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z) - Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization
Framework [39.981546951333556]
This paper focuses on weight quantization, a hardware-friendly model compression approach.
It is motivated by (1) the distribution of the weights in the different rows are not the same; and (2) the potential of achieving better utilization of FPGA hardware resources.
arXiv Detail & Related papers (2020-12-08T06:25:07Z) - MSP: An FPGA-Specific Mixed-Scheme, Multi-Precision Deep Neural Network
Quantization Framework [39.43144643349916]
This paper targets the commonly used FPGA devices as the hardware platforms for deep learning edge computing.
We propose a mixed-scheme DNN quantization method that incorporates both the linear and non-linear number systems for quantization.
We use a quantization method that supports multiple precisions along the intra-layer dimension, while the existing quantization methods apply multi-precision quantization along the inter-layer dimension.
arXiv Detail & Related papers (2020-09-16T04:24:18Z) - Leveraging Automated Mixed-Low-Precision Quantization for tiny edge
microcontrollers [76.30674794049293]
This paper presents an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices.
Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors.
Given an MCU-class memory bound to 2MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions.
arXiv Detail & Related papers (2020-08-12T06:09:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.