Related papers: NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference

NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference

URL: http://arxiv.org/abs/2504.00592v1
Date: Tue, 01 Apr 2025 09:52:38 GMT
Title: NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference
Authors: Marta Andronic, George A. Constantinides,
Abstract summary: Efficient neural networks (NNs) leveraging lookup tables (LUTs) have demonstrated significant potential for emerging AI applications.<n>Existing LUT-based designs suffer from accuracy degradation due to the large fan-in required by neurons being limited by the exponential scaling of LUT resources with input width.<n>We present NeuraLUT-Assemble, a novel framework that addresses these limitations by combining mixed-precision techniques with the assembly of larger neurons from smaller units.
Score: 2.7086888205833968
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Efficient neural networks (NNs) leveraging lookup tables (LUTs) have demonstrated significant potential for emerging AI applications, particularly when deployed on field-programmable gate arrays (FPGAs) for edge computing. These architectures promise ultra-low latency and reduced resource utilization, broadening neural network adoption in fields such as particle physics. However, existing LUT-based designs suffer from accuracy degradation due to the large fan-in required by neurons being limited by the exponential scaling of LUT resources with input width. In practice, in prior work this tension has resulted in the reliance on extremely sparse models. We present NeuraLUT-Assemble, a novel framework that addresses these limitations by combining mixed-precision techniques with the assembly of larger neurons from smaller units, thereby increasing connectivity while keeping the number of inputs of any given LUT manageable. Additionally, we introduce skip-connections across entire LUT structures to improve gradient flow. NeuraLUT-Assemble closes the accuracy gap between LUT-based methods and (fully-connected) MLP-based models, achieving competitive accuracy on tasks such as network intrusion detection, digit classification, and jet classification, demonstrating up to $8.42\times$ reduction in the area-delay product compared to the state-of-the-art at the time of the publication.

Related papers

SparseLUT: Sparse Connectivity Optimization for Lookup Table-based Deep Neural Networks [0.0]
This paper introduces SparseLUT, a connectivity-centric training technique tailored for LUT-based deep neural networks (DNNs)<n> Experimental results show consistent accuracy improvements across benchmarks, including up to a 2.13% increase on MNIST.<n>This is done without any hardware overhead and achieves state-of-the-art results for LUT-based DNNs.
arXiv Detail & Related papers (2025-03-17T05:21:54Z)
PolyLUT-Add: FPGA-based LUT Inference with Wide Inputs [1.730979251211628]
This work introduces PolyLUT-Add, a technique that enhances neuron connectivity by combining $A$ PolyLUT sub-neurons via addition to improve accuracy. We evaluate our implementation over the MNIST, Jet Substructure classification, and Network Intrusion Detection benchmark and found that for similar accuracy, PolyLUT-Add achieves a LUT reduction of $2.0-13.9times$ with a $1.2-1.6times$ decrease in latency.
arXiv Detail & Related papers (2024-06-07T13:00:57Z)
NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions [2.7086888205833968]
Field-Programmable Gate Array (FPGA) accelerators have proven successful in handling latency- and resource-critical deep neural network (DNN) inference tasks. We propose relaxing the boundaries of neurons and mapping entire sub-networks to a single LUT. We validate our proposed method on a known latency-critical task, jet substructure tagging, and on the classical computer vision task, digit classification using MNIST.
arXiv Detail & Related papers (2024-02-29T16:10:21Z)
Learning k-Level Structured Sparse Neural Networks Using Group Envelope Regularization [4.0554893636822]
We introduce a novel approach to deploy large-scale Deep Neural Networks on constrained resources. The method speeds up inference time and aims to reduce memory demand and power consumption.
arXiv Detail & Related papers (2022-12-25T15:40:05Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference [3.2296078260106174]
We propose the learned optimization of such LUT-based topologies, resulting in higher-efficiency designs. Existing implementations of this class of architecture require the manual specification of the number of inputs per LUT, K. We propose logic shrinkage, a fine-grained netlist pruning methodology enabling K to be automatically learned for every LUT in a neural network targeted for FPGA inference.
arXiv Detail & Related papers (2021-12-04T14:23:24Z)
Neural network relief: a pruning algorithm based on neural activity [47.57448823030151]
We propose a simple importance-score metric that deactivates unimportant connections. We achieve comparable performance for LeNet architectures on MNIST. The algorithm is not designed to minimize FLOPs when considering current hardware and software implementations.
arXiv Detail & Related papers (2021-09-22T15:33:49Z)
Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts. One promising approach to reduce the energy costs is representing DNNs with low-precision numbers. We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z)
ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF) ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.