FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
- URL: http://arxiv.org/abs/2308.05170v2
- Date: Wed, 13 Dec 2023 01:44:05 GMT
- Title: FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
- Authors: Benjamin Ramhorst, Vladimir Loncar, George A. Constantinides
- Abstract summary: Pruning sparsifies a neural network, reducing the number of multiplications and memory.
We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures.
Proposed method achieves reductions ranging between 55% and 92% in the DSP utilization and up to 81% in BRAM utilization.
- Score: 3.294652922898631
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks achieve state-of-the-art performance in image classification,
speech recognition, scientific analysis and many more application areas. Due to
the high computational complexity and memory footprint of neural networks,
various compression techniques, such as pruning and quantization, have been
proposed in literature. Pruning sparsifies a neural network, reducing the
number of multiplications and memory. However, pruning often fails to capture
properties of the underlying hardware, causing unstructured sparsity and
load-balance inefficiency, thus bottlenecking resource improvements. We propose
a hardware-centric formulation of pruning, by formulating it as a knapsack
problem with resource-aware tensor structures. Evaluated on a range of tasks,
including sub-microsecond particle classification at CERN's Large Hadron
Collider and fast image classification, the proposed method achieves reductions
ranging between 55% and 92% in the DSP utilization and up to 81% in BRAM
utilization.
Related papers
- Low-bit Shift Network for End-to-End Spoken Language Understanding [7.851607739211987]
We propose the use of power-of-two quantization, which quantizes continuous parameters into low-bit power-of-two values.
This reduces computational complexity by removing expensive multiplication operations and with the use of low-bit weights.
arXiv Detail & Related papers (2022-07-15T14:34:22Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - Ps and Qs: Quantization-aware pruning for efficient low latency neural
network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications.
We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - ItNet: iterative neural networks with small graphs for accurate and
efficient anytime prediction [1.52292571922932]
In this study, we introduce a class of network models that have a small memory footprint in terms of their computational graphs.
We show state-of-the-art results for semantic segmentation on the CamVid and Cityscapes datasets.
arXiv Detail & Related papers (2021-01-21T15:56:29Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - ALF: Autoencoder-based Low-rank Filter-sharing for Efficient
Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF)
ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z) - Efficient Computation Reduction in Bayesian Neural Networks Through
Feature Decomposition and Memorization [10.182119276564643]
In this paper, an efficient BNN inference flow is proposed to reduce the computation cost.
About half of the computations could be eliminated compared to the traditional approach.
We implement our approach in Verilog and synthesise it with 45 $nm$ FreePDK technology.
arXiv Detail & Related papers (2020-05-08T05:03:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.