Related papers: Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML

Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML

URL: http://arxiv.org/abs/2003.06308v2
Date: Mon, 29 Jun 2020 09:15:11 GMT
Title: Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
Authors: Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Duc Hoang, Sergo Jindariani, Edward Kreinar, Mia Liu, Vladimir Loncar, Jennifer Ngadiuba, Kevin Pedro, Maurizio Pierini, Dylan Rankin, Sheila Sagear, Sioni Summers, Nhan Tran, Zhenbin Wu
Abstract summary: We present the implementation of binary and ternary neural networks in the hls4ml library. We discuss the trade-off between model accuracy and resource consumption. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.
Score: 13.325670094073383
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.

Related papers

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference [4.093167352780157]
We introduce Logarithmic Posits (LP), an adaptive, hardware-friendly data type inspired by posits. We also develop a novel genetic-algorithm based framework, LP Quantization (LPQ), to find optimal layer-wise LP parameters.
arXiv Detail & Related papers (2024-03-08T17:28:49Z)
PulseDL-II: A System-on-Chip Neural Network Accelerator for Timing and Energy Extraction of Nuclear Detector Signals [3.307097167756987]
We introduce PulseDL-II, a system-on-chip (SoC) specially designed for applications of event feature (time, energy, etc.) extraction from pulses with deep learning. The proposed system achieved 60 ps time resolution and 0.40% energy resolution with online neural network inference at signal to noise ratio (SNR) of 47.4 dB.
arXiv Detail & Related papers (2022-09-02T08:52:21Z)
LPYOLO: Low Precision YOLO for Face Detection on FPGA [1.7188280334580197]
Face detection on surveillance systems is the most expected application on the security market. TinyYolov3 architecture is redesigned and deployed for face detection. Model is converted to an HLS based application with using FINN framework and FINN-HLS library.
arXiv Detail & Related papers (2022-07-21T13:54:52Z)
Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition [67.95996816744251]
State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications. Current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of LMs to quantization errors. Novel mixed precision neural network LM quantization methods are proposed in this paper.
arXiv Detail & Related papers (2021-11-29T12:24:02Z)
Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding [94.40747235081466]
We propose an end-to-end deep learning-based joint transceiver design algorithm for millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems. We develop a DNN architecture that maps the received pilots into feedback bits at the receiver, and then further maps the feedback bits into the hybrid precoder at the transmitter.
arXiv Detail & Related papers (2021-10-22T20:49:02Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Benchmarking Quantized Neural Networks on FPGAs with FINN [0.42439262432068253]
Using lower precision comes at the cost of negligible loss in accuracy. This article aims to assess the impact of mixed-precision when applied to neural networks deployed on FPGAs.
arXiv Detail & Related papers (2021-02-02T06:42:07Z)
Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming [97.40955121478716]
We propose a first-order dual SDP algorithm that requires memory only linear in the total number of network activations. We significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively. We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
arXiv Detail & Related papers (2020-10-22T12:32:29Z)
Binarized Graph Neural Network [65.20589262811677]
We develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters. Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches. Experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space.
arXiv Detail & Related papers (2020-04-19T09:43:14Z)
Switchable Precision Neural Networks [35.2752928147013]
Switchable Precision neural Networks (SP-Nets) are proposed to train a shared network capable of operating at multiple quantization levels. At runtime, the network can adjust its precision on the fly according to instant memory, latency, power consumption and accuracy demands.
arXiv Detail & Related papers (2020-02-07T14:43:44Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.