Related papers: PolyLUT-Add: FPGA-based LUT Inference with Wide Inputs

PolyLUT-Add: FPGA-based LUT Inference with Wide Inputs

URL: http://arxiv.org/abs/2406.04910v2
Date: Sun, 15 Sep 2024 12:32:18 GMT
Title: PolyLUT-Add: FPGA-based LUT Inference with Wide Inputs
Authors: Binglei Lou, Richard Rademacher, David Boland, Philip H. W. Leong,
Abstract summary: This work introduces PolyLUT-Add, a technique that enhances neuron connectivity by combining $A$ PolyLUT sub-neurons via addition to improve accuracy. We evaluate our implementation over the MNIST, Jet Substructure classification, and Network Intrusion Detection benchmark and found that for similar accuracy, PolyLUT-Add achieves a LUT reduction of $2.0-13.9times$ with a $1.2-1.6times$ decrease in latency.
Score: 1.730979251211628
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: FPGAs have distinct advantages as a technology for deploying deep neural networks (DNNs) at the edge. Lookup Table (LUT) based networks, where neurons are directly modeled using LUTs, help maximize this promise of offering ultra-low latency and high area efficiency on FPGAs. Unfortunately, LUT resource usage scales exponentially with the number of inputs to the LUT, restricting PolyLUT to small LUT sizes. This work introduces PolyLUT-Add, a technique that enhances neuron connectivity by combining $A$ PolyLUT sub-neurons via addition to improve accuracy. Moreover, we describe a novel architecture to improve its scalability. We evaluated our implementation over the MNIST, Jet Substructure classification, and Network Intrusion Detection benchmark and found that for similar accuracy, PolyLUT-Add achieves a LUT reduction of $2.0-13.9\times$ with a $1.2-1.6\times$ decrease in latency.

Related papers

NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference [2.7086888205833968]
Efficient neural networks (NNs) leveraging lookup tables (LUTs) have demonstrated significant potential for emerging AI applications. Existing LUT-based designs suffer from accuracy degradation due to the large fan-in required by neurons being limited by the exponential scaling of LUT resources with input width. We present NeuraLUT-Assemble, a novel framework that addresses these limitations by combining mixed-precision techniques with the assembly of larger neurons from smaller units.
arXiv Detail & Related papers (2025-04-01T09:52:38Z)
SparseLUT: Sparse Connectivity Optimization for Lookup Table-based Deep Neural Networks [0.0]
This paper introduces SparseLUT, a connectivity-centric training technique tailored for LUT-based deep neural networks (DNNs) Experimental results show consistent accuracy improvements across benchmarks, including up to a 2.13% increase on MNIST. This is done without any hardware overhead and achieves state-of-the-art results for LUT-based DNNs.
arXiv Detail & Related papers (2025-03-17T05:21:54Z)
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator [11.167930856636161]
We introduce LUT-DLA, a Look-Up Table (LUT) Deep Learning Accelerator Framework that utilizes vector quantization to convert neural network models into LUTs. We show that LUT-DLA achieves improvements in power efficiency and area efficiency with gains of $1.4$$7.0times$ and $1.5$$146.1times$, respectively.
arXiv Detail & Related papers (2025-01-18T05:27:25Z)
PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured Pruning [8.791770352147989]
We propose a novel approach to training DNNs for FPGA deployment using CERNs as the basic building block. Our method takes advantage of the flexibility offered by soft logic, hiding the evaluation inside the LUTs with minimal overhead. We demonstrate the effectiveness of PolyLUT on three tasks: network intrusion detection, jet identification at the Large Hadron Collider, and MNIST.
arXiv Detail & Related papers (2025-01-14T11:51:57Z)
NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions [2.7086888205833968]
Field-Programmable Gate Array (FPGA) accelerators have proven successful in handling latency- and resource-critical deep neural network (DNN) inference tasks. We propose relaxing the boundaries of neurons and mapping entire sub-networks to a single LUT. We validate our proposed method on a known latency-critical task, jet substructure tagging, and on the classical computer vision task, digit classification using MNIST.
arXiv Detail & Related papers (2024-02-29T16:10:21Z)
PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference [3.1999570171901786]
We show that by using building blocks, we can achieve the same accuracy using fewer layers of soft logic than by using linear functions. We demonstrate the effectiveness of this approach in three tasks: network intrusion detection, jet identification at the CERN Large Hadron Collider, and handwritten digit recognition using the MNIST dataset.
arXiv Detail & Related papers (2023-09-05T15:54:09Z)
Toward DNN of LUTs: Learning Efficient Image Restoration with Multiple Look-Up Tables [47.15181829317732]
High-definition screens on edge devices stimulate a strong demand for efficient image restoration algorithms. The size of a single look-up table grows exponentially with the increase of its indexing capacity. We propose a universal method to construct multiple LUTs like a neural network, termed MuLUT.
arXiv Detail & Related papers (2023-03-25T16:00:33Z)
The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$. We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z)
Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference [3.2296078260106174]
We propose the learned optimization of such LUT-based topologies, resulting in higher-efficiency designs. Existing implementations of this class of architecture require the manual specification of the number of inputs per LUT, K. We propose logic shrinkage, a fine-grained netlist pruning methodology enabling K to be automatically learned for every LUT in a neural network targeted for FPGA inference.
arXiv Detail & Related papers (2021-12-04T14:23:24Z)
Adder Neural Networks [75.54239599016535]
We present adder networks (AdderNets) to trade massive multiplications in deep neural networks. In AdderNets, we take the $ell_p$-norm distance between filters and input feature as the output response. We show that the proposed AdderNets can achieve 75.7% Top-1 accuracy 92.3% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
arXiv Detail & Related papers (2021-05-29T04:02:51Z)
NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms. This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z)
Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z)
Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions. $Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z)
AdderNet: Do We Really Need Multiplications in Deep Learning? [159.174891462064]
We present adder networks (AdderNets) to trade massive multiplications in deep neural networks for much cheaper additions to reduce computation costs. We develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
arXiv Detail & Related papers (2019-12-31T06:56:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.