Coding for Computation: Efficient Compression of Neural Networks for Reconfigurable Hardware
- URL: http://arxiv.org/abs/2504.17403v1
- Date: Thu, 24 Apr 2025 09:49:18 GMT
- Title: Coding for Computation: Efficient Compression of Neural Networks for Reconfigurable Hardware
- Authors: Hans Rosenberger, Rodrigo Fischer, Johanna S. Fröhlich, Ali Bereyhi, Ralf R. Müller,
- Abstract summary: We introduce a compression scheme that reduces the number of computations required for NN inference on reconfigurable hardware such as FPGAs.<n>This is achieved by combining pruning via regularized training, weight sharing and linear computation coding.<n>The proposed scheme achieves competitive performance for simple multilayer perceptrons, as well as for large scale deep NNs such as ResNet-34.
- Score: 12.524633524391827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As state of the art neural networks (NNs) continue to grow in size, their resource-efficient implementation becomes ever more important. In this paper, we introduce a compression scheme that reduces the number of computations required for NN inference on reconfigurable hardware such as FPGAs. This is achieved by combining pruning via regularized training, weight sharing and linear computation coding (LCC). Contrary to common NN compression techniques, where the objective is to reduce the memory used for storing the weights of the NNs, our approach is optimized to reduce the number of additions required for inference in a hardware-friendly manner. The proposed scheme achieves competitive performance for simple multilayer perceptrons, as well as for large scale deep NNs such as ResNet-34.
Related papers
- FPGA Resource-aware Structured Pruning for Real-Time Neural Networks [3.294652922898631]
Pruning sparsifies a neural network, reducing the number of multiplications and memory.
We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures.
Proposed method achieves reductions ranging between 55% and 92% in the DSP utilization and up to 81% in BRAM utilization.
arXiv Detail & Related papers (2023-08-09T18:14:54Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Learning k-Level Structured Sparse Neural Networks Using Group Envelope Regularization [4.0554893636822]
We introduce a novel approach to deploy large-scale Deep Neural Networks on constrained resources.
The method speeds up inference time and aims to reduce memory demand and power consumption.
arXiv Detail & Related papers (2022-12-25T15:40:05Z) - CoNLoCNN: Exploiting Correlation and Non-Uniform Quantization for
Energy-Efficient Low-precision Deep Convolutional Neural Networks [13.520972975766313]
We propose a framework to enable energy-efficient low-precision deep convolutional neural network inference by exploiting non-uniform quantization of weights.
We also propose a novel data representation format, Encoded Low-Precision Binary Signed Digit, to compress the bit-width of weights.
arXiv Detail & Related papers (2022-07-31T01:34:56Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Haar Wavelet Feature Compression for Quantized Graph Convolutional
Networks [7.734726150561088]
Graph Convolutional Networks (GCNs) are widely used in a variety of applications, and can be seen as an unstructured version of standard Convolutional Neural Networks (CNNs)
As in CNNs, the computational cost of GCNs for large input graphs can be high and inhibit the use of these networks, especially in environments with low computational resources.
We propose to utilize Haar wavelet compression and light quantization to reduce the computations and the bandwidth involved with the network.
arXiv Detail & Related papers (2021-10-10T15:25:37Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Efficient Computation Reduction in Bayesian Neural Networks Through
Feature Decomposition and Memorization [10.182119276564643]
In this paper, an efficient BNN inference flow is proposed to reduce the computation cost.
About half of the computations could be eliminated compared to the traditional approach.
We implement our approach in Verilog and synthesise it with 45 $nm$ FreePDK technology.
arXiv Detail & Related papers (2020-05-08T05:03:04Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.