Related papers: On the Resilience of Deep Learning for Reduced-voltage FPGAs

On the Resilience of Deep Learning for Reduced-voltage FPGAs

URL: http://arxiv.org/abs/2001.00053v1
Date: Thu, 26 Dec 2019 15:08:22 GMT
Title: On the Resilience of Deep Learning for Reduced-voltage FPGAs
Authors: Kamyar Givaki, Behzad Salami, Reza Hojabr, S. M. Reza Tayaranian, Ahmad Khonsari, Dara Rahmati, Saeid Gorgin, Adrian Cristal, Osman S. Unsal
Abstract summary: This paper experimentally evaluates the resilience of the training phase of Deep Neural Networks (DNNs) in the presence of voltage underscaling related faults of FPGAs. We have found that modern FPGAs are robust enough in extremely low-voltage levels. Approximately 10% more training are needed to fill the gap in the accuracy.
Score: 1.7998044061364233
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Neural Networks (DNNs) are inherently computation-intensive and also power-hungry. Hardware accelerators such as Field Programmable Gate Arrays (FPGAs) are a promising solution that can satisfy these requirements for both embedded and High-Performance Computing (HPC) systems. In FPGAs, as well as CPUs and GPUs, aggressive voltage scaling below the nominal level is an effective technique for power dissipation minimization. Unfortunately, bit-flip faults start to appear as the voltage is scaled down closer to the transistor threshold due to timing issues, thus creating a resilience issue. This paper experimentally evaluates the resilience of the training phase of DNNs in the presence of voltage underscaling related faults of FPGAs, especially in on-chip memories. Toward this goal, we have experimentally evaluated the resilience of LeNet-5 and also a specially designed network for CIFAR-10 dataset with different activation functions of Rectified Linear Unit (Relu) and Hyperbolic Tangent (Tanh). We have found that modern FPGAs are robust enough in extremely low-voltage levels and that low-voltage related faults can be automatically masked within the training iterations, so there is no need for costly software- or hardware-oriented fault mitigation techniques like ECC. Approximately 10% more training iterations are needed to fill the gap in the accuracy. This observation is the result of the relatively low rate of undervolting faults, i.e., <0.1\%, measured on real FPGA fabrics. We have also increased the fault rate significantly for the LeNet-5 network by randomly generated fault injection campaigns and observed that the training accuracy starts to degrade. When the fault rate increases, the network with Tanh activation function outperforms the one with Relu in terms of accuracy, e.g., when the fault rate is 30% the accuracy difference is 4.92%.

Related papers

On-Chip Learning with Memristor-Based Neural Networks: Assessing Accuracy and Efficiency Under Device Variations, Conductance Errors, and Input Noise [0.0]
This paper presents a memristor-based compute-in-memory hardware accelerator for on-chip training and inference. Hardware, consisting of 30 memristors and 4 neurons, utilizes three different M-SDC structures with tungsten, chromium, and carbon media to perform binary image classification tasks.
arXiv Detail & Related papers (2024-08-26T23:10:01Z)
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection [49.43407207482008]
SpacTor is a new training procedure consisting of a hybrid objective combining span corruption (SC) and token replacement detection (RTD) In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training.
arXiv Detail & Related papers (2024-01-24T00:36:13Z)
NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes [50.00272243518593]
Deep neural networks (DNNs) have become ubiquitous in machine learning, but their energy consumption remains problematically high. We have developed NeuralFuse, a novel add-on module that handles the energy-accuracy tradeoff in low-voltage regimes. At a 1% bit-error rate, NeuralFuse can reduce access energy by up to 24% while recovering accuracy by up to 57%.
arXiv Detail & Related papers (2023-06-29T11:38:22Z)
Improving Reliability of Spiking Neural Networks through Fault Aware Threshold Voltage Optimization [0.0]
Spiking neural networks (SNNs) have made breakthroughs in computer vision by lending themselves to neuromorphic hardware. Systolic-array SNN accelerators (systolicSNNs) have been proposed recently, but their reliability is still a major concern. We present a novel fault mitigation method, i.e., fault-aware threshold voltage optimization in retraining (FalVolt)
arXiv Detail & Related papers (2023-01-12T19:30:21Z)
CorrectNet: Robustness Enhancement of Analog In-Memory Computing for Neural Networks by Error Suppression and Compensation [4.570841222958966]
We propose a framework to enhance the robustness of neural networks under variations and noise. We show that inference accuracy of neural networks can be recovered from as low as 1.69% under variations and noise.
arXiv Detail & Related papers (2022-11-27T19:13:33Z)
GNN4REL: Graph Neural Networks for Predicting Circuit Reliability Degradation [7.650966670809372]
We employ graph neural networks (GNNs) to accurately estimate the impact of process variations and device aging on the delay of any path within a circuit. GNN4REL is trained on a FinFET technology model that is calibrated against industrial 14nm measurement data. We successfully estimate delay degradations of all paths -- notably within seconds -- with a mean absolute error down to 0.01 percentage points.
arXiv Detail & Related papers (2022-08-04T20:09:12Z)
On the Tradeoff between Energy, Precision, and Accuracy in Federated Quantized Neural Networks [68.52621234990728]
Federated learning (FL) over wireless networks requires balancing between accuracy, energy efficiency, and precision. We propose a quantized FL framework that represents data with a finite level of precision in both local training and uplink transmission. Our framework can reduce energy consumption by up to 53% compared to a standard FL model.
arXiv Detail & Related papers (2021-11-15T17:00:03Z)
Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure DNN Accelerators [105.60654479548356]
We show that a combination of robust fixed-point quantization, weight clipping, as well as random bit error training (RandBET) improves robustness against random or adversarial bit errors in quantized DNN weights significantly. This leads to high energy savings for low-voltage operation as well as low-precision quantization, but also improves security of DNN accelerators.
arXiv Detail & Related papers (2021-04-16T19:11:14Z)
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients. FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z)
Bit Error Robustness for Energy-Efficient DNN Accelerators [93.58572811484022]
We show that a combination of robust fixed-point quantization, weight clipping, and random bit error training (RandBET) improves robustness against random bit errors. This leads to high energy savings from both low-voltage operation as well as low-precision quantization.
arXiv Detail & Related papers (2020-06-24T18:23:10Z)
Training End-to-End Analog Neural Networks with Equilibrium Propagation [64.0476282000118]
We introduce a principled method to train end-to-end analog neural networks by gradient descent. We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models. Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning.
arXiv Detail & Related papers (2020-06-02T23:38:35Z)
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration [9.06484009562659]
Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We experimentally study the reduced-voltage operation of multiple components of real FPGAs. We propose techniques to minimize the drawbacks of reduced-voltage operation.
arXiv Detail & Related papers (2020-05-04T22:59:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.