Large-Scale Application of Fault Injection into PyTorch Models -- an
Extension to PyTorchFI for Validation Efficiency
- URL: http://arxiv.org/abs/2310.19449v1
- Date: Mon, 30 Oct 2023 11:18:35 GMT
- Title: Large-Scale Application of Fault Injection into PyTorch Models -- an
Extension to PyTorchFI for Validation Efficiency
- Authors: Ralf Graafe, Qutub Syed Sha, Florian Geissler, Michael Paulitsch
- Abstract summary: We introduce a novel fault injection framework called PyTorchALFI (Application Level Fault Injection for PyTorch) based on PyTorchFI.
PyTorchALFI provides an efficient way to define randomly generated and reusable sets of faults to inject into PyTorch models.
- Score: 1.7205106391379026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transient or permanent faults in hardware can render the output of Neural
Networks (NN) incorrect without user-specific traces of the error, i.e. silent
data errors (SDE). On the other hand, modern NNs also possess an inherent
redundancy that can tolerate specific faults. To establish a safety case, it is
necessary to distinguish and quantify both types of corruptions. To study the
effects of hardware (HW) faults on software (SW) in general and NN models in
particular, several fault injection (FI) methods have been established in
recent years. Current FI methods focus on the methodology of injecting faults
but often fall short of accounting for large-scale FI tests, where many fault
locations based on a particular fault model need to be analyzed in a short
time. Results need to be concise, repeatable, and comparable. To address these
requirements and enable fault injection as the default component in a machine
learning development cycle, we introduce a novel fault injection framework
called PyTorchALFI (Application Level Fault Injection for PyTorch) based on
PyTorchFI. PyTorchALFI provides an efficient way to define randomly generated
and reusable sets of faults to inject into PyTorch models, defines complex test
scenarios, enhances data sets, and generates test KPIs while tightly coupling
fault-free, faulty, and modified NN. In this paper, we provide details about
the definition of test scenarios, software architecture, and several examples
of how to use the new framework to apply iterative changes in fault location
and number, compare different model modifications, and analyze test results.
Related papers
- Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - MRFI: An Open Source Multi-Resolution Fault Injection Framework for
Neural Network Processing [8.871260896931211]
MRFI is a highly multi-resolution fault injection tool for deep neural networks.
It integrates extensive fault analysis functionalities from different perspectives.
It does not modify the major neural network computing framework of PyTorch.
arXiv Detail & Related papers (2023-06-20T06:46:54Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - ISimDL: Importance Sampling-Driven Acceleration of Fault Injection
Simulations for Evaluating the Robustness of Deep Learning [10.757663798809144]
We propose ISimDL, a novel methodology that employs neuron sensitivity to generate importance sampling-based fault-scenarios.
Our experiments show that the importance sampling provides up to 15x higher precision in selecting critical faults than the random uniform sampling, reaching such precision in less than 100 faults.
arXiv Detail & Related papers (2023-03-14T16:15:28Z) - enpheeph: A Fault Injection Framework for Spiking and Compressed Deep
Neural Networks [10.757663798809144]
We present enpheeph, a Fault Injection Framework for Spiking and Compressed Deep Neural Networks (DNNs)
By injecting a random and increasing number of faults, we show that DNNs can show a reduction in accuracy with a fault rate as low as 7 x 10 (-7) faults per parameter, with an accuracy drop higher than 40%.
arXiv Detail & Related papers (2022-07-31T00:30:59Z) - Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine.
These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults.
We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - Fault Injectors for TensorFlow: Evaluation of the Impact of Random
Hardware Faults on Deep CNNs [4.854070123523902]
We introduce two new Fault Injection (FI) frameworks for evaluating how Deep Learning (DL) components operate under the presence of random faults.
In this paper, we present the results of FI experiments conducted on four VGG-based Convolutional NNs using two image sets.
Results help to identify the most critical operations and layers, compare the reliability characteristics of functionally similar NNs, and introduce selective fault tolerance mechanisms.
arXiv Detail & Related papers (2020-12-13T11:16:25Z) - NADS: Neural Architecture Distribution Search for Uncertainty Awareness [79.18710225716791]
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data.
Existing OoD detection approaches are prone to errors and even sometimes assign higher likelihoods to OoD samples.
We propose Neural Architecture Distribution Search (NADS) to identify common building blocks among all uncertainty-aware architectures.
arXiv Detail & Related papers (2020-06-11T17:39:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.