On the Resilience of Deep Learning for Reduced-voltage FPGAs
- URL: http://arxiv.org/abs/2001.00053v1
- Date: Thu, 26 Dec 2019 15:08:22 GMT
- Title: On the Resilience of Deep Learning for Reduced-voltage FPGAs
- Authors: Kamyar Givaki, Behzad Salami, Reza Hojabr, S. M. Reza Tayaranian,
Ahmad Khonsari, Dara Rahmati, Saeid Gorgin, Adrian Cristal, Osman S. Unsal
- Abstract summary: This paper experimentally evaluates the resilience of the training phase of Deep Neural Networks (DNNs) in the presence of voltage underscaling related faults of FPGAs.
We have found that modern FPGAs are robust enough in extremely low-voltage levels.
Approximately 10% more training are needed to fill the gap in the accuracy.
- Score: 1.7998044061364233
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks (DNNs) are inherently computation-intensive and also
power-hungry. Hardware accelerators such as Field Programmable Gate Arrays
(FPGAs) are a promising solution that can satisfy these requirements for both
embedded and High-Performance Computing (HPC) systems. In FPGAs, as well as
CPUs and GPUs, aggressive voltage scaling below the nominal level is an
effective technique for power dissipation minimization. Unfortunately, bit-flip
faults start to appear as the voltage is scaled down closer to the transistor
threshold due to timing issues, thus creating a resilience issue.
This paper experimentally evaluates the resilience of the training phase of
DNNs in the presence of voltage underscaling related faults of FPGAs,
especially in on-chip memories. Toward this goal, we have experimentally
evaluated the resilience of LeNet-5 and also a specially designed network for
CIFAR-10 dataset with different activation functions of Rectified Linear Unit
(Relu) and Hyperbolic Tangent (Tanh). We have found that modern FPGAs are
robust enough in extremely low-voltage levels and that low-voltage related
faults can be automatically masked within the training iterations, so there is
no need for costly software- or hardware-oriented fault mitigation techniques
like ECC. Approximately 10% more training iterations are needed to fill the gap
in the accuracy. This observation is the result of the relatively low rate of
undervolting faults, i.e., <0.1\%, measured on real FPGA fabrics. We have also
increased the fault rate significantly for the LeNet-5 network by randomly
generated fault injection campaigns and observed that the training accuracy
starts to degrade. When the fault rate increases, the network with Tanh
activation function outperforms the one with Relu in terms of accuracy, e.g.,
when the fault rate is 30% the accuracy difference is 4.92%.
Related papers
- On-Chip Learning with Memristor-Based Neural Networks: Assessing Accuracy and Efficiency Under Device Variations, Conductance Errors, and Input Noise [0.0]
This paper presents a memristor-based compute-in-memory hardware accelerator for on-chip training and inference.
Hardware, consisting of 30 memristors and 4 neurons, utilizes three different M-SDC structures with tungsten, chromium, and carbon media to perform binary image classification tasks.
arXiv Detail & Related papers (2024-08-26T23:10:01Z) - SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced
Token Detection [49.43407207482008]
SpacTor is a new training procedure consisting of a hybrid objective combining span corruption (SC) and token replacement detection (RTD)
In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training.
arXiv Detail & Related papers (2024-01-24T00:36:13Z) - NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes [50.00272243518593]
Deep neural networks (DNNs) have become ubiquitous in machine learning, but their energy consumption remains problematically high.
We have developed NeuralFuse, a novel add-on module that handles the energy-accuracy tradeoff in low-voltage regimes.
At a 1% bit-error rate, NeuralFuse can reduce access energy by up to 24% while recovering accuracy by up to 57%.
arXiv Detail & Related papers (2023-06-29T11:38:22Z) - Improving Reliability of Spiking Neural Networks through Fault Aware
Threshold Voltage Optimization [0.0]
Spiking neural networks (SNNs) have made breakthroughs in computer vision by lending themselves to neuromorphic hardware.
Systolic-array SNN accelerators (systolicSNNs) have been proposed recently, but their reliability is still a major concern.
We present a novel fault mitigation method, i.e., fault-aware threshold voltage optimization in retraining (FalVolt)
arXiv Detail & Related papers (2023-01-12T19:30:21Z) - CorrectNet: Robustness Enhancement of Analog In-Memory Computing for
Neural Networks by Error Suppression and Compensation [4.570841222958966]
We propose a framework to enhance the robustness of neural networks under variations and noise.
We show that inference accuracy of neural networks can be recovered from as low as 1.69% under variations and noise.
arXiv Detail & Related papers (2022-11-27T19:13:33Z) - GNN4REL: Graph Neural Networks for Predicting Circuit Reliability
Degradation [7.650966670809372]
We employ graph neural networks (GNNs) to accurately estimate the impact of process variations and device aging on the delay of any path within a circuit.
GNN4REL is trained on a FinFET technology model that is calibrated against industrial 14nm measurement data.
We successfully estimate delay degradations of all paths -- notably within seconds -- with a mean absolute error down to 0.01 percentage points.
arXiv Detail & Related papers (2022-08-04T20:09:12Z) - On the Tradeoff between Energy, Precision, and Accuracy in Federated
Quantized Neural Networks [68.52621234990728]
Federated learning (FL) over wireless networks requires balancing between accuracy, energy efficiency, and precision.
We propose a quantized FL framework that represents data with a finite level of precision in both local training and uplink transmission.
Our framework can reduce energy consumption by up to 53% compared to a standard FL model.
arXiv Detail & Related papers (2021-11-15T17:00:03Z) - Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure
DNN Accelerators [105.60654479548356]
We show that a combination of robust fixed-point quantization, weight clipping, as well as random bit error training (RandBET) improves robustness against random or adversarial bit errors in quantized DNN weights significantly.
This leads to high energy savings for low-voltage operation as well as low-precision quantization, but also improves security of DNN accelerators.
arXiv Detail & Related papers (2021-04-16T19:11:14Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Bit Error Robustness for Energy-Efficient DNN Accelerators [93.58572811484022]
We show that a combination of robust fixed-point quantization, weight clipping, and random bit error training (RandBET) improves robustness against random bit errors.
This leads to high energy savings from both low-voltage operation as well as low-precision quantization.
arXiv Detail & Related papers (2020-06-24T18:23:10Z) - Training End-to-End Analog Neural Networks with Equilibrium Propagation [64.0476282000118]
We introduce a principled method to train end-to-end analog neural networks by gradient descent.
We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models.
Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning.
arXiv Detail & Related papers (2020-06-02T23:38:35Z) - An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for
Neural Network Acceleration [9.06484009562659]
Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase.
We experimentally study the reduced-voltage operation of multiple components of real FPGAs.
We propose techniques to minimize the drawbacks of reduced-voltage operation.
arXiv Detail & Related papers (2020-05-04T22:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.