Design Challenges of Neural Network Acceleration Using Stochastic
Computing
- URL: http://arxiv.org/abs/2006.05352v1
- Date: Mon, 8 Jun 2020 16:06:56 GMT
- Title: Design Challenges of Neural Network Acceleration Using Stochastic
Computing
- Authors: Alireza Khadem
- Abstract summary: This report evaluates and compares two proposed-based NN designs for the Internet of Things (IoTs)
We find that BISC outperforms the other architectures when executing the MNIST-5 NN model.
Our analysis and simulation experiments indicate that this architecture is around 50X faster, 5.7X less area and consumes 7.8X and 1.8X less power than the two ESL designs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The enormous and ever-increasing complexity of state-of-the-art neural
networks (NNs) has impeded the deployment of deep learning on resource-limited
devices such as the Internet of Things (IoTs). Stochastic computing exploits
the inherent amenability to approximation characteristic of NNs to reduce their
energy and area footprint, two critical requirements of small embedded devices
suitable for the IoTs. This report evaluates and compares two recently proposed
stochastic-based NN designs, referred to as BISC (Binary Interfaced Stochastic
Computing) by Sim and Lee, 2017, and ESL (Extended Stochastic Logic) by Canals
et al., 2016. Using analysis and simulation, we compare three distinct
implementations of these designs in terms of performance, power consumption,
area, and accuracy. We also discuss the overall challenges faced in adopting
stochastic computing for building NNs. We find that BISC outperforms the other
architectures when executing the LeNet-5 NN model applied to the MNIST digit
recognition dataset. Our analysis and simulation experiments indicate that this
architecture is around 50X faster, occupies 5.7X and 2.9X less area, and
consumes 7.8X and 1.8X less power than the two ESL architectures.
Related papers
- Bayesian Inference Accelerator for Spiking Neural Networks [3.145754107337963]
spiking neural networks (SNNs) have the potential to reduce computational area and power.
In this work, we demonstrate an optimization framework for developing and implementing efficient Bayesian SNNs in hardware.
We demonstrate accuracies comparable to Bayesian binary networks with full-precision Bernoulli parameters, while requiring up to $25times$ less spikes.
arXiv Detail & Related papers (2024-01-27T16:27:19Z) - Quantization-aware Neural Architectural Search for Intrusion Detection [5.010685611319813]
We present a design methodology that automatically trains and evolves quantized neural network (NN) models that are a thousand times smaller than state-of-the-art NNs.
The number of LUTs utilized by this network when deployed to an FPGA is between 2.3x and 8.5x smaller with performance comparable to prior work.
arXiv Detail & Related papers (2023-11-07T18:35:29Z) - YFlows: Systematic Dataflow Exploration and Code Generation for
Efficient Neural Network Inference using SIMD Architectures on CPUs [3.1445034800095413]
We address the challenges associated with deploying neural networks on CPUs.
Our novel approach is to use the dataflow of a neural network to explore data reuse opportunities.
Our results show that the dataflow that keeps outputs in SIMD registers consistently yields the best performance.
arXiv Detail & Related papers (2023-10-01T05:11:54Z) - Energy-Efficient On-Board Radio Resource Management for Satellite
Communications via Neuromorphic Computing [59.40731173370976]
We investigate the application of energy-efficient brain-inspired machine learning models for on-board radio resource management.
For relevant workloads, spiking neural networks (SNNs) implemented on Loihi 2 yield higher accuracy, while reducing power consumption by more than 100$times$ as compared to the CNN-based reference platform.
arXiv Detail & Related papers (2023-08-22T03:13:57Z) - Low-bit Shift Network for End-to-End Spoken Language Understanding [7.851607739211987]
We propose the use of power-of-two quantization, which quantizes continuous parameters into low-bit power-of-two values.
This reduces computational complexity by removing expensive multiplication operations and with the use of low-bit weights.
arXiv Detail & Related papers (2022-07-15T14:34:22Z) - Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution
Detection [55.028065567756066]
Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications.
In this paper we propose an uncertainty quantification approach by modelling the distribution of features.
We incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble neural networks (BE-SNNs) and overcome the feature collapse problem.
We show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionM
arXiv Detail & Related papers (2022-06-26T16:00:22Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - FATNN: Fast and Accurate Ternary Neural Networks [89.07796377047619]
Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.
In this work, we show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2.
We elaborately design an implementation-dependent ternary quantization algorithm to mitigate the performance gap.
arXiv Detail & Related papers (2020-08-12T04:26:18Z) - Fully-parallel Convolutional Neural Network Hardware [0.7829352305480285]
We propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware.
For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA.
arXiv Detail & Related papers (2020-06-22T17:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.