Related papers: Reliability Assessment of Neural Networks in GPUs: A Framework For Permanent Faults Injections

Reliability Assessment of Neural Networks in GPUs: A Framework For Permanent Faults Injections

URL: http://arxiv.org/abs/2205.12177v1
Date: Tue, 24 May 2022 16:21:53 GMT
Title: Reliability Assessment of Neural Networks in GPUs: A Framework For Permanent Faults Injections
Authors: Juan-David Guerrero-Balaguera, Luigi Galasso, Robert Limas Sierra, Matteo Sonza Reorda
Abstract summary: This paper proposes a framework, resorting to a binary instrumentation tool, to perform fault injection campaigns on a GPU. This environment allows for the first time assessing the reliability of CNNs deployed on a GPU considering the presence of permanent faults.
Score: 1.0992151305603266
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Currently, Deep learning and especially Convolutional Neural Networks (CNNs) have become a fundamental computational approach applied in a wide range of domains, including some safety-critical applications (e.g., automotive, robotics, and healthcare equipment). Therefore, the reliability evaluation of those computational systems is mandatory. The reliability evaluation of CNNs is performed by fault injection campaigns at different levels of abstraction, from the application level down to the hardware level. Many works have focused on evaluating the reliability of neural networks in the presence of transient faults. However, the effects of permanent faults have been investigated at the application level, only, e.g., targeting the parameters of the network. This paper intends to propose a framework, resorting to a binary instrumentation tool to perform fault injection campaigns, targeting different components inside the GPU, such as the register files and the functional units. This environment allows for the first time assessing the reliability of CNNs deployed on a GPU considering the presence of permanent faults.

Related papers

Fault injection analysis of Real NVP normalising flow model for satellite anomaly detection [0.22369578015657962]
Satellites are used for a multitude of applications, including communications, Earth observation, and space science. One critical application of Artificial Intelligence (AI) is fault detection. Despite the advantages of neural networks, these systems are vulnerable to radiation errors, which can significantly impact their reliability.
arXiv Detail & Related papers (2025-04-02T08:32:59Z)
Evaluating Single Event Upsets in Deep Neural Networks for Semantic Segmentation: an embedded system perspective [1.474723404975345]
This paper delves into the robustness assessment in embedded Deep Neural Networks (DNNs) By scrutinizing the layer-by-layer and bit-by-bit sensitivity of various encoder-decoder models to soft errors, this study thoroughly investigates the vulnerability of segmentation DNNs to SEUs. We propose a set of practical lightweight error mitigation techniques with no memory or computational cost suitable for resource-constrained deployments.
arXiv Detail & Related papers (2024-12-04T18:28:38Z)
Convolutional Neural Network Design and Evaluation for Real-Time Multivariate Time Series Fault Detection in Spacecraft Attitude Sensors [41.94295877935867]
This paper presents a novel approach to detecting stuck values within the Accelerometer and Inertial Measurement Unit of a drone-like spacecraft. A multi-channel Convolutional Neural Network (CNN) is used to perform multi-target classification and independently detect faults in the sensors. An integration methodology is proposed to enable the network to effectively detect anomalies and trigger recovery actions at the system level.
arXiv Detail & Related papers (2024-10-11T09:36:38Z)
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks [43.0686937643683]
We introduce gradient routing, a training method that isolates capabilities to specific subregions of a neural network. We show that gradient routing can be used to learn representations which are partitioned in an interpretable way. We conclude that the approach holds promise for challenging, real-world applications where quality data are scarce.
arXiv Detail & Related papers (2024-10-06T02:43:49Z)
Special Session: Approximation and Fault Resiliency of DNN Accelerators [0.9126382223122612]
This paper explores the approximation and fault resiliency of Deep Neural Network accelerators. We propose to use approximate (AxC) arithmetic circuits to emulate errors in hardware without performing fault injection on the DNN. We also propose a fine-grain analysis of fault resiliency by examining fault propagation and masking in networks.
arXiv Detail & Related papers (2023-05-31T19:27:45Z)
FERN: Leveraging Graph Attention Networks for Failure Evaluation and Robust Network Design [46.302926845889694]
We develop a learning-based framework, FERN, for scalable Failure Evaluation and Robust Network design. FERN represents rich problem inputs as a graph and captures both local and global views by attentively performing feature extraction from the graph. It can speed up multiple robust network design problems by more than 80x, 200x, 10x, respectively with negligible performance gap.
arXiv Detail & Related papers (2023-05-30T15:56:25Z)
DeepVigor: Vulnerability Value Ranges and Factors for DNNs' Reliability Assessment [1.189955933770711]
Deep Neural Networks (DNNs) and their accelerators are being deployed more frequently in safety-critical applications. We propose a novel accurate, fine-grain, metric-oriented, and accelerator-agnostic method called DeepVigor.
arXiv Detail & Related papers (2023-03-13T08:55:10Z)
Quantization-aware Interval Bound Propagation for Training Certifiably Robust Quantized Neural Networks [58.195261590442406]
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs) Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization. We present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs.
arXiv Detail & Related papers (2022-11-29T13:32:38Z)
Towards a Safety Case for Hardware Fault Tolerance in Convolutional Neural Networks Using Activation Range Supervision [1.7968112116887602]
Convolutional neural networks (CNNs) have become an established part of numerous safety-critical computer vision applications. We build a prototypical safety case for CNNs by demonstrating that range supervision represents a highly reliable fault detector. We explore novel, non-uniform range restriction methods that effectively suppress the probability of silent data corruptions and uncorrectable errors.
arXiv Detail & Related papers (2021-08-16T11:13:55Z)
Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs. Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z)
Provably Training Neural Network Classifiers under Fairness Constraints [70.64045590577318]
We show that overparametrized neural networks could meet the constraints. Key ingredient of building a fair neural network classifier is establishing no-regret analysis for neural networks.
arXiv Detail & Related papers (2020-12-30T18:46:50Z)
Fault Injectors for TensorFlow: Evaluation of the Impact of Random Hardware Faults on Deep CNNs [4.854070123523902]
We introduce two new Fault Injection (FI) frameworks for evaluating how Deep Learning (DL) components operate under the presence of random faults. In this paper, we present the results of FI experiments conducted on four VGG-based Convolutional NNs using two image sets. Results help to identify the most critical operations and layers, compare the reliability characteristics of functionally similar NNs, and introduce selective fault tolerance mechanisms.
arXiv Detail & Related papers (2020-12-13T11:16:25Z)
FAT: Training Neural Networks for Reliable Inference Under Hardware Faults [3.191587417198382]
We present a novel methodology called fault-aware training (FAT), which includes error modeling during neural network (NN) training, to make QNNs resilient to specific fault models on the device. FAT has been validated for numerous classification tasks including CIFAR10, GTSRB, SVHN and ImageNet.
arXiv Detail & Related papers (2020-11-11T16:09:39Z)
Binary Neural Networks: A Survey [126.67799882857656]
The binary neural network serves as a promising technique for deploying deep models on resource-limited devices. The binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network. We present a survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error.
arXiv Detail & Related papers (2020-03-31T16:47:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.