Related papers: A Learning Framework for n-bit Quantized Neural Networks toward FPGAs

A Learning Framework for n-bit Quantized Neural Networks toward FPGAs

URL: http://arxiv.org/abs/2004.02396v1
Date: Mon, 6 Apr 2020 04:21:24 GMT
Title: A Learning Framework for n-bit Quantized Neural Networks toward FPGAs
Authors: Jun Chen, Liang Liu, Yong Liu, Xianfang Zeng
Abstract summary: This paper proposes a novel learning framework for n-bit QNNs, whose weights are constrained to the power of two. We also propose a novel QNN structure named n-BQ-NN, which uses shift operation to replace the multiply operation. Experiments show that our n-BQ-NN with our SVPE can execute 2.9 times faster than with the vector processing element (VPE) in inference.
Score: 20.83904734716565
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The quantized neural network (QNN) is an efficient approach for network compression and can be widely used in the implementation of FPGAs. This paper proposes a novel learning framework for n-bit QNNs, whose weights are constrained to the power of two. To solve the gradient vanishing problem, we propose a reconstructed gradient function for QNNs in back-propagation algorithm that can directly get the real gradient rather than estimating an approximate gradient of the expected loss. We also propose a novel QNN structure named n-BQ-NN, which uses shift operation to replace the multiply operation and is more suitable for the inference on FPGAs. Furthermore, we also design a shift vector processing element (SVPE) array to replace all 16-bit multiplications with SHIFT operations in convolution operation on FPGAs. We also carry out comparable experiments to evaluate our framework. The experimental results show that the quantized models of ResNet, DenseNet and AlexNet through our learning framework can achieve almost the same accuracies with the original full-precision models. Moreover, when using our learning framework to train our n-BQ-NN from scratch, it can achieve state-of-the-art results compared with typical low-precision QNNs. Experiments on Xilinx ZCU102 platform show that our n-BQ-NN with our SVPE can execute 2.9 times faster than with the vector processing element (VPE) in inference. As the SHIFT operation in our SVPE array will not consume Digital Signal Processings (DSPs) resources on FPGAs, the experiments have shown that the use of SVPE array also reduces average energy consumption to 68.7% of the VPE array with 16-bit.

Related papers

TreeLUT: An Efficient Alternative to Deep Neural Networks for Inference Acceleration Using Gradient Boosted Decision Trees [0.6906005491572401]
We present TreeLUT, an open-source tool for implementing gradient boosted decision trees (GBDTs) on FPGAs. We show the effectiveness of TreeLUT using multiple datasets classification, commonly used to evaluate ultra-low area and latency. Our results show that TreeLUT significantly improves hardware utilization, latency, and throughput at competitive accuracy compared to previous works.
arXiv Detail & Related papers (2025-01-02T19:38:07Z)
NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions [2.7086888205833968]
Field-Programmable Gate Array (FPGA) accelerators have proven successful in handling latency- and resource-critical deep neural network (DNN) inference tasks. We propose relaxing the boundaries of neurons and mapping entire sub-networks to a single LUT. We validate our proposed method on a known latency-critical task, jet substructure tagging, and on the classical computer vision task, digit classification using MNIST.
arXiv Detail & Related papers (2024-02-29T16:10:21Z)
Exploiting FPGA Capabilities for Accelerated Biomedical Computing [0.0]
This study presents advanced neural network architectures for enhanced ECG signal analysis using Field Programmable Gate Arrays (FPGAs) We utilize the MIT-BIH Arrhythmia Database for training and validation, introducing Gaussian noise to improve robustness. The study ultimately offers a guide for optimizing neural network performance on FPGAs for various applications.
arXiv Detail & Related papers (2023-07-16T01:20:17Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow. We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC) We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z)
Implementing Neural Network-Based Equalizers in a Coherent Optical Transmission System Using Field-Programmable Gate Arrays [3.1543509940301946]
We show the offline FPGA realization of both recurrent and feedforward neural network (NN)-based equalizers for nonlinearity compensation in coherent optical transmission systems. The main results are divided into three parts: a performance comparison, an analysis of how activation functions are implemented, and a report on the complexity of the hardware.
arXiv Detail & Related papers (2022-12-09T07:28:45Z)
Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs. SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space. Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms. This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z)
FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations [20.218382369944152]
Binary neural networks (BNNs) have 1-bit weights and activations. BNNs tend to produce a much lower accuracy on realistic datasets such as ImageNet. This work proposes FracBNN, which exploits fractional activations to substantially improve the accuracy of BNNs.
arXiv Detail & Related papers (2020-12-22T17:49:30Z)
Computational optimization of convolutional neural networks using separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.