Related papers: ILMPQ : An Intra-Layer Multi-Precision Deep Neural Network Quantization framework for FPGA

ILMPQ : An Intra-Layer Multi-Precision Deep Neural Network Quantization framework for FPGA

URL: http://arxiv.org/abs/2111.00155v1
Date: Sat, 30 Oct 2021 03:02:52 GMT
Title: ILMPQ : An Intra-Layer Multi-Precision Deep Neural Network Quantization framework for FPGA
Authors: Sung-En Chang, Yanyu Li, Mengshu Sun, Yanzhi Wang, Xue Lin
Abstract summary: This work targets the commonly used FPGA (field-programmable gate array) devices as the hardware platform for DNN edge computing. We use a quantization method that supports multiple precisions along the intra-layer dimension. We achieve 3.65x speedup in end-to-end inference time on the ImageNet, compared with the fixed-point quantization method.
Score: 37.780528948703406
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work targets the commonly used FPGA (field-programmable gate array) devices as the hardware platform for DNN edge computing. We focus on DNN quantization as the main model compression technique. The novelty of this work is: We use a quantization method that supports multiple precisions along the intra-layer dimension, while the existing quantization methods apply multi-precision quantization along the inter-layer dimension. The intra-layer multi-precision method can uniform the hardware configurations for different layers to reduce computation overhead and at the same time preserve the model accuracy as the inter-layer approach. Our proposed ILMPQ DNN quantization framework achieves 70.73 Top1 accuracy in ResNet-18 on the ImageNet dataset. We also validate the proposed MSP framework on two FPGA devices i.e., Xilinx XC7Z020 and XC7Z045. We achieve 3.65x speedup in end-to-end inference time on the ImageNet, compared with the fixed-point quantization method.

Related papers

Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs) This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z)
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices. For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator. For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z)
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow. We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC) We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z)
A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification [0.0]
A promising approach is quantization, in which the full-precision values are stored in low bit-width precision. We present a comprehensive survey of quantization concepts and methods, with a focus on image classification. We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization.
arXiv Detail & Related papers (2022-05-14T15:08:32Z)
RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions [43.27226390407956]
This work proposes a novel Deep Neural Network (DNN) quantization framework, namely RMSMP, with a Row-wise Mixed-Scheme and Multi-Precision approach. The proposed RMSMP is tested for the image classification and natural language processing (BERT) applications. It achieves the best accuracy performance among state-of-the-arts under the same equivalent precisions.
arXiv Detail & Related papers (2021-10-30T02:53:35Z)
Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers. We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z)
One Model for All Quantization: A Quantized Network Supporting Hot-Swap Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths. We use wavelet decomposition and reconstruction to increase the diversity of weights. Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z)
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework [39.981546951333556]
This paper focuses on weight quantization, a hardware-friendly model compression approach. It is motivated by (1) the distribution of the weights in the different rows are not the same; and (2) the potential of achieving better utilization of FPGA hardware resources.
arXiv Detail & Related papers (2020-12-08T06:25:07Z)
MSP: An FPGA-Specific Mixed-Scheme, Multi-Precision Deep Neural Network Quantization Framework [39.43144643349916]
This paper targets the commonly used FPGA devices as the hardware platforms for deep learning edge computing. We propose a mixed-scheme DNN quantization method that incorporates both the linear and non-linear number systems for quantization. We use a quantization method that supports multiple precisions along the intra-layer dimension, while the existing quantization methods apply multi-precision quantization along the inter-layer dimension.
arXiv Detail & Related papers (2020-09-16T04:24:18Z)
Leveraging Automated Mixed-Low-Precision Quantization for tiny edge microcontrollers [76.30674794049293]
This paper presents an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors. Given an MCU-class memory bound to 2MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions.
arXiv Detail & Related papers (2020-08-12T06:09:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.