A Learning Framework for n-bit Quantized Neural Networks toward FPGAs
- URL: http://arxiv.org/abs/2004.02396v1
- Date: Mon, 6 Apr 2020 04:21:24 GMT
- Title: A Learning Framework for n-bit Quantized Neural Networks toward FPGAs
- Authors: Jun Chen, Liang Liu, Yong Liu, Xianfang Zeng
- Abstract summary: This paper proposes a novel learning framework for n-bit QNNs, whose weights are constrained to the power of two.
We also propose a novel QNN structure named n-BQ-NN, which uses shift operation to replace the multiply operation.
Experiments show that our n-BQ-NN with our SVPE can execute 2.9 times faster than with the vector processing element (VPE) in inference.
- Score: 20.83904734716565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The quantized neural network (QNN) is an efficient approach for network
compression and can be widely used in the implementation of FPGAs. This paper
proposes a novel learning framework for n-bit QNNs, whose weights are
constrained to the power of two. To solve the gradient vanishing problem, we
propose a reconstructed gradient function for QNNs in back-propagation
algorithm that can directly get the real gradient rather than estimating an
approximate gradient of the expected loss. We also propose a novel QNN
structure named n-BQ-NN, which uses shift operation to replace the multiply
operation and is more suitable for the inference on FPGAs. Furthermore, we also
design a shift vector processing element (SVPE) array to replace all 16-bit
multiplications with SHIFT operations in convolution operation on FPGAs. We
also carry out comparable experiments to evaluate our framework. The
experimental results show that the quantized models of ResNet, DenseNet and
AlexNet through our learning framework can achieve almost the same accuracies
with the original full-precision models. Moreover, when using our learning
framework to train our n-BQ-NN from scratch, it can achieve state-of-the-art
results compared with typical low-precision QNNs. Experiments on Xilinx ZCU102
platform show that our n-BQ-NN with our SVPE can execute 2.9 times faster than
with the vector processing element (VPE) in inference. As the SHIFT operation
in our SVPE array will not consume Digital Signal Processings (DSPs) resources
on FPGAs, the experiments have shown that the use of SVPE array also reduces
average energy consumption to 68.7% of the VPE array with 16-bit.
Related papers
- NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions [2.7086888205833968]
Field-Programmable Gate Array (FPGA) accelerators have proven successful in handling latency- and resource-critical deep neural network (DNN) inference tasks.
We propose relaxing the boundaries of neurons and mapping entire sub-networks to a single LUT.
We validate our proposed method on a known latency-critical task, jet substructure tagging, and on the classical computer vision task, digit classification using MNIST.
arXiv Detail & Related papers (2024-02-29T16:10:21Z) - Exploiting FPGA Capabilities for Accelerated Biomedical Computing [0.0]
This study presents advanced neural network architectures for enhanced ECG signal analysis using Field Programmable Gate Arrays (FPGAs)
We utilize the MIT-BIH Arrhythmia Database for training and validation, introducing Gaussian noise to improve robustness.
The study ultimately offers a guide for optimizing neural network performance on FPGAs for various applications.
arXiv Detail & Related papers (2023-07-16T01:20:17Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - End-to-end codesign of Hessian-aware quantized neural networks for FPGAs
and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs)
This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow.
We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC)
We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z) - Implementing Neural Network-Based Equalizers in a Coherent Optical
Transmission System Using Field-Programmable Gate Arrays [3.1543509940301946]
We show the offline FPGA realization of both recurrent and feedforward neural network (NN)-based equalizers for nonlinearity compensation in coherent optical transmission systems.
The main results are divided into three parts: a performance comparison, an analysis of how activation functions are implemented, and a report on the complexity of the hardware.
arXiv Detail & Related papers (2022-12-09T07:28:45Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function
Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms.
This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z) - FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with
Fractional Activations [20.218382369944152]
Binary neural networks (BNNs) have 1-bit weights and activations.
BNNs tend to produce a much lower accuracy on realistic datasets such as ImageNet.
This work proposes FracBNN, which exploits fractional activations to substantially improve the accuracy of BNNs.
arXiv Detail & Related papers (2020-12-22T17:49:30Z) - Computational optimization of convolutional neural networks using
separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing.
Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.