Related papers: A Novel Approximate Hamming Weight Computing for Spiking Neural Networks: an FPGA Friendly Architecture

A Novel Approximate Hamming Weight Computing for Spiking Neural Networks: an FPGA Friendly Architecture

URL: http://arxiv.org/abs/2104.14594v1
Date: Thu, 29 Apr 2021 18:27:51 GMT
Title: A Novel Approximate Hamming Weight Computing for Spiking Neural Networks: an FPGA Friendly Architecture
Authors: Kaveh Akbarzadeh-Sherbaf, Mikaeel Bahmani, Danial Ghiaseddin, Saeed Safari, Abdol-Hossein Vahabie
Abstract summary: We propose a method inspired by synaptic transmission failure for exploiting FPGA lookup tables to compress long input vectors. We classify the compressors into shallow ones with up to two levels of lookup tables and deep ones with more than two levels. Our simulation results show that calculating the Hamming weight of a 1024-bit vector of a spiking neural network by the use of only deep compressors preserves the chaotic behavior of the network.
Score: 0.47248250311484113
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hamming weights of sparse and long binary vectors are important modules in many scientific applications, particularly in spiking neural networks that are of our interest. To improve both area and latency of their FPGA implementations, we propose a method inspired from synaptic transmission failure for exploiting FPGA lookup tables to compress long input vectors. To evaluate the effectiveness of this approach, we count the number of `1's of the compressed vector using a simple linear adder. We classify the compressors into shallow ones with up to two levels of lookup tables and deep ones with more than two levels. The architecture generated by this approach shows up to 82% and 35% reductions for different configurations of shallow compressors in area and latency respectively. Moreover, our simulation results show that calculating the Hamming weight of a 1024-bit vector of a spiking neural network by the use of only deep compressors preserves the chaotic behavior of the network while slightly impacts on the learning performance.

Related papers

Low Power Approximate Multiplier Architecture for Deep Neural Networks [0.0]
A 4:2 compressor, introducing only a single combination error, is designed and integrated into an 8x8 unsigned multiplier.<n>The proposed multiplier is employed within a custom convolution layer and evaluated on neural network tasks, including image recognition and denoising.
arXiv Detail & Related papers (2025-08-31T09:25:42Z)
Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z)
LUTMUL: Exceed Conventional FPGA Roofline Limit by LUT-based Efficient Multiplication for Neural Network Inference [25.342107763021147]
This paper introduces LUTMUL, which harnesses the potential of look-up tables (LUTs) for performing multiplications. By exploiting this advantage of LUTs, our method demonstrates a potential boost in the performance of FPGA-based neural network accelerators.
arXiv Detail & Related papers (2024-11-01T02:54:11Z)
MST-compression: Compressing and Accelerating Binary Neural Networks with Minimum Spanning Tree [21.15961593182111]
Binary neural networks (BNNs) have been widely adopted to reduce the computational cost and memory storage on edge-computing devices. However, as neural networks become wider/deeper to improve accuracy and meet practical requirements, the computational burden remains a significant challenge even on the binary version. This paper proposes a novel method called Minimum Spanning Tree (MST) compression that learns to compress and accelerate BNNs.
arXiv Detail & Related papers (2023-08-26T02:42:12Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x. We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence [32.03465747357384]
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN.
arXiv Detail & Related papers (2021-05-15T00:10:12Z)
Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z)
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC) Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z)
Distance-Weighted Graph Neural Networks on FPGAs for Real-Time Particle Reconstruction in High Energy Physics [11.125632758828266]
We discuss how to design distance-weighted graph networks that can be executed with a latency of less than 1$mumathrms$ on an FPGA. We consider a representative task associated to particle reconstruction and identification in a next-generation calorimeter operating at a particle collider. We convert the compressed models into firmware to be implemented on an FPGA.
arXiv Detail & Related papers (2020-08-08T21:26:31Z)
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.