PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA
LUT-based Inference
- URL: http://arxiv.org/abs/2309.02334v2
- Date: Mon, 6 Nov 2023 17:28:51 GMT
- Title: PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA
LUT-based Inference
- Authors: Marta Andronic and George A. Constantinides
- Abstract summary: We show that by using building blocks, we can achieve the same accuracy using fewer layers of soft logic than by using linear functions.
We demonstrate the effectiveness of this approach in three tasks: network intrusion detection, jet identification at the CERN Large Hadron Collider, and handwritten digit recognition using the MNIST dataset.
- Score: 3.1999570171901786
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Field-programmable gate arrays (FPGAs) are widely used to implement deep
learning inference. Standard deep neural network inference involves the
computation of interleaved linear maps and nonlinear activation functions.
Prior work for ultra-low latency implementations has hardcoded the combination
of linear maps and nonlinear activations inside FPGA lookup tables (LUTs). Our
work is motivated by the idea that the LUTs in an FPGA can be used to implement
a much greater variety of functions than this. In this paper, we propose a
novel approach to training neural networks for FPGA deployment using
multivariate polynomials as the basic building block. Our method takes
advantage of the flexibility offered by the soft logic, hiding the polynomial
evaluation inside the LUTs with minimal overhead. We show that by using
polynomial building blocks, we can achieve the same accuracy using considerably
fewer layers of soft logic than by using linear functions, leading to
significant latency and area improvements. We demonstrate the effectiveness of
this approach in three tasks: network intrusion detection, jet identification
at the CERN Large Hadron Collider, and handwritten digit recognition using the
MNIST dataset.
Related papers
- PolyLUT-Add: FPGA-based LUT Inference with Wide Inputs [1.730979251211628]
This work introduces PolyLUT-Add, a technique that enhances neuron connectivity by combining $A$ PolyLUT sub-neurons via addition to improve accuracy.
We evaluate our implementation over the MNIST, Jet Substructure classification, and Network Intrusion Detection benchmark and found that for similar accuracy, PolyLUT-Add achieves a LUT reduction of $2.0-13.9times$ with a $1.2-1.6times$ decrease in latency.
arXiv Detail & Related papers (2024-06-07T13:00:57Z) - NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions [2.7086888205833968]
Field-Programmable Gate Array (FPGA) accelerators have proven successful in handling latency- and resource-critical deep neural network (DNN) inference tasks.
We propose relaxing the boundaries of neurons and mapping entire sub-networks to a single LUT.
We validate our proposed method on a known latency-critical task, jet substructure tagging, and on the classical computer vision task, digit classification using MNIST.
arXiv Detail & Related papers (2024-02-29T16:10:21Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - Regularization of polynomial networks for image recognition [78.4786845859205]
Polynomial Networks (PNs) have emerged as an alternative method with a promising performance and improved interpretability.
We introduce a class of PNs, which are able to reach the performance of ResNet across a range of six benchmarks.
arXiv Detail & Related papers (2023-03-24T10:05:22Z) - Polynomial Neural Fields for Subband Decomposition and Manipulation [78.2401411189246]
We propose a new class of neural fields called neural fields (PNFs)
The key advantage of a PNF is that it can represent a signal as a composition of manipulable and interpretable components without losing the merits of neural fields.
We empirically demonstrate that Fourier PNFs enable signal manipulation applications such as texture transfer and scale-space.
arXiv Detail & Related papers (2023-02-09T18:59:04Z) - Real-Time GPU-Accelerated Machine Learning Based Multiuser Detection for
5G and Beyond [70.81551587109833]
nonlinear beamforming filters can significantly outperform linear approaches in stationary scenarios with massive connectivity.
One of the main challenges comes from the real-time implementation of these algorithms.
This paper explores the acceleration of APSM-based algorithms through massive parallelization.
arXiv Detail & Related papers (2022-01-13T15:20:45Z) - A Deep Learning Inference Scheme Based on Pipelined Matrix
Multiplication Acceleration Design and Non-uniform Quantization [9.454905560571085]
We introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology.
Results show that our method can achieve better performance with fewer power consumption.
arXiv Detail & Related papers (2021-10-10T17:31:27Z) - NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function
Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms.
This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Physics-Based Deep Learning for Fiber-Optic Communication Systems [10.630021520220653]
We propose a new machine-learning approach for fiber-optic communication systems governed by the nonlinear Schr"odinger equation (NLSE)
Our main observation is that the popular split-step method (SSM) for numerically solving the NLSE has essentially the same functional form as a deep multi-layer neural network.
We exploit this connection by parameterizing the SSM and viewing the linear steps as general linear functions, similar to the weight matrices in a neural network.
arXiv Detail & Related papers (2020-10-27T12:55:23Z) - Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions.
$Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.