Related papers: PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference

PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference

URL: http://arxiv.org/abs/2309.02334v2
Date: Mon, 6 Nov 2023 17:28:51 GMT
Title: PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference
Authors: Marta Andronic and George A. Constantinides
Abstract summary: We show that by using building blocks, we can achieve the same accuracy using fewer layers of soft logic than by using linear functions. We demonstrate the effectiveness of this approach in three tasks: network intrusion detection, jet identification at the CERN Large Hadron Collider, and handwritten digit recognition using the MNIST dataset.
Score: 3.1999570171901786
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Field-programmable gate arrays (FPGAs) are widely used to implement deep learning inference. Standard deep neural network inference involves the computation of interleaved linear maps and nonlinear activation functions. Prior work for ultra-low latency implementations has hardcoded the combination of linear maps and nonlinear activations inside FPGA lookup tables (LUTs). Our work is motivated by the idea that the LUTs in an FPGA can be used to implement a much greater variety of functions than this. In this paper, we propose a novel approach to training neural networks for FPGA deployment using multivariate polynomials as the basic building block. Our method takes advantage of the flexibility offered by the soft logic, hiding the polynomial evaluation inside the LUTs with minimal overhead. We show that by using polynomial building blocks, we can achieve the same accuracy using considerably fewer layers of soft logic than by using linear functions, leading to significant latency and area improvements. We demonstrate the effectiveness of this approach in three tasks: network intrusion detection, jet identification at the CERN Large Hadron Collider, and handwritten digit recognition using the MNIST dataset.

Related papers

PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured Pruning [8.791770352147989]
We propose a novel approach to training DNNs for FPGA deployment using CERNs as the basic building block. Our method takes advantage of the flexibility offered by soft logic, hiding the evaluation inside the LUTs with minimal overhead. We demonstrate the effectiveness of PolyLUT on three tasks: network intrusion detection, jet identification at the Large Hadron Collider, and MNIST.
arXiv Detail & Related papers (2025-01-14T11:51:57Z)
TreeLUT: An Efficient Alternative to Deep Neural Networks for Inference Acceleration Using Gradient Boosted Decision Trees [0.6906005491572401]
We present TreeLUT, an open-source tool for implementing gradient boosted decision trees (GBDTs) on FPGAs. We show the effectiveness of TreeLUT using multiple datasets classification, commonly used to evaluate ultra-low area and latency. Our results show that TreeLUT significantly improves hardware utilization, latency, and throughput at competitive accuracy compared to previous works.
arXiv Detail & Related papers (2025-01-02T19:38:07Z)
PolyLUT-Add: FPGA-based LUT Inference with Wide Inputs [1.730979251211628]
This work introduces PolyLUT-Add, a technique that enhances neuron connectivity by combining $A$ PolyLUT sub-neurons via addition to improve accuracy. We evaluate our implementation over the MNIST, Jet Substructure classification, and Network Intrusion Detection benchmark and found that for similar accuracy, PolyLUT-Add achieves a LUT reduction of $2.0-13.9times$ with a $1.2-1.6times$ decrease in latency.
arXiv Detail & Related papers (2024-06-07T13:00:57Z)
NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions [2.7086888205833968]
Field-Programmable Gate Array (FPGA) accelerators have proven successful in handling latency- and resource-critical deep neural network (DNN) inference tasks. We propose relaxing the boundaries of neurons and mapping entire sub-networks to a single LUT. We validate our proposed method on a known latency-critical task, jet substructure tagging, and on the classical computer vision task, digit classification using MNIST.
arXiv Detail & Related papers (2024-02-29T16:10:21Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
Regularization of polynomial networks for image recognition [78.4786845859205]
Polynomial Networks (PNs) have emerged as an alternative method with a promising performance and improved interpretability. We introduce a class of PNs, which are able to reach the performance of ResNet across a range of six benchmarks.
arXiv Detail & Related papers (2023-03-24T10:05:22Z)
Polynomial Neural Fields for Subband Decomposition and Manipulation [78.2401411189246]
We propose a new class of neural fields called neural fields (PNFs) The key advantage of a PNF is that it can represent a signal as a composition of manipulable and interpretable components without losing the merits of neural fields. We empirically demonstrate that Fourier PNFs enable signal manipulation applications such as texture transfer and scale-space.
arXiv Detail & Related papers (2023-02-09T18:59:04Z)
Real-Time GPU-Accelerated Machine Learning Based Multiuser Detection for 5G and Beyond [70.81551587109833]
nonlinear beamforming filters can significantly outperform linear approaches in stationary scenarios with massive connectivity. One of the main challenges comes from the real-time implementation of these algorithms. This paper explores the acceleration of APSM-based algorithms through massive parallelization.
arXiv Detail & Related papers (2022-01-13T15:20:45Z)
A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization [9.454905560571085]
We introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. Results show that our method can achieve better performance with fewer power consumption.
arXiv Detail & Related papers (2021-10-10T17:31:27Z)
NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms. This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
Physics-Based Deep Learning for Fiber-Optic Communication Systems [10.630021520220653]
We propose a new machine-learning approach for fiber-optic communication systems governed by the nonlinear Schr"odinger equation (NLSE) Our main observation is that the popular split-step method (SSM) for numerically solving the NLSE has essentially the same functional form as a deep multi-layer neural network. We exploit this connection by parameterizing the SSM and viewing the linear steps as general linear functions, similar to the weight matrices in a neural network.
arXiv Detail & Related papers (2020-10-27T12:55:23Z)
Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions. $Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.