ISimDL: Importance Sampling-Driven Acceleration of Fault Injection
Simulations for Evaluating the Robustness of Deep Learning
- URL: http://arxiv.org/abs/2303.08035v2
- Date: Thu, 25 May 2023 07:54:27 GMT
- Title: ISimDL: Importance Sampling-Driven Acceleration of Fault Injection
Simulations for Evaluating the Robustness of Deep Learning
- Authors: Alessio Colucci, Andreas Steininger, Muhammad Shafique
- Abstract summary: We propose ISimDL, a novel methodology that employs neuron sensitivity to generate importance sampling-based fault-scenarios.
Our experiments show that the importance sampling provides up to 15x higher precision in selecting critical faults than the random uniform sampling, reaching such precision in less than 100 faults.
- Score: 10.757663798809144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning (DL) systems have proliferated in many applications, requiring
specialized hardware accelerators and chips. In the nano-era, devices have
become increasingly more susceptible to permanent and transient faults.
Therefore, we need an efficient methodology for analyzing the resilience of
advanced DL systems against such faults, and understand how the faults in
neural accelerator chips manifest as errors at the DL application level, where
faults can lead to undetectable and unrecoverable errors. Using fault
injection, we can perform resilience investigations of the DL system by
modifying neuron weights and outputs at the software-level, as if the hardware
had been affected by a transient fault. Existing fault models reduce the search
space, allowing faster analysis, but requiring a-priori knowledge on the model,
and not allowing further analysis of the filtered-out search space. Therefore,
we propose ISimDL, a novel methodology that employs neuron sensitivity to
generate importance sampling-based fault-scenarios. Without any a-priori
knowledge of the model-under-test, ISimDL provides an equivalent reduction of
the search space as existing works, while allowing long simulations to cover
all the possible faults, improving on existing model requirements. Our
experiments show that the importance sampling provides up to 15x higher
precision in selecting critical faults than the random uniform sampling,
reaching such precision in less than 100 faults. Additionally, we showcase
another practical use-case for importance sampling for reliable DNN design,
namely Fault Aware Training (FAT). By using ISimDL to select the faults leading
to errors, we can insert the faults during the DNN training process to harden
the DNN against such faults. Using importance sampling in FAT reduces the
overhead required for finding faults that lead to a predetermined drop in
accuracy by more than 12x.
Related papers
- Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Causal Disentanglement Hidden Markov Model for Fault Diagnosis [55.90917958154425]
We propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism.
Specifically, we make full use of the time-series data and progressively disentangle the vibration signal into fault-relevant and fault-irrelevant factors.
To expand the scope of the application, we adopt unsupervised domain adaptation to transfer the learned disentangled representations to other working environments.
arXiv Detail & Related papers (2023-08-06T05:58:45Z) - Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine.
These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults.
We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z) - Truncated tensor Schatten p-norm based approach for spatiotemporal
traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers.
Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - FAT: Training Neural Networks for Reliable Inference Under Hardware
Faults [3.191587417198382]
We present a novel methodology called fault-aware training (FAT), which includes error modeling during neural network (NN) training, to make QNNs resilient to specific fault models on the device.
FAT has been validated for numerous classification tasks including CIFAR10, GTSRB, SVHN and ImageNet.
arXiv Detail & Related papers (2020-11-11T16:09:39Z) - Few-Shot Bearing Fault Diagnosis Based on Model-Agnostic Meta-Learning [3.8015092217142223]
We propose a few-shot learning framework for bearing fault diagnosis based on model-agnostic meta-learning (MAML)
Case studies show that the proposed framework achieves an overall accuracy up to 25% higher than a Siamese network-based benchmark study.
arXiv Detail & Related papers (2020-07-25T04:03:18Z) - High-level Modeling of Manufacturing Faults in Deep Neural Network
Accelerators [2.6258269516366557]
Google's Unit Processing (TPU) is a neural network accelerator that uses systolic array-based matrix multiplication hardware for computation in its crux.
Manufacturing faults at any state element of the matrix multiplication unit can cause unexpected errors in these inference networks.
We propose a formal model of permanent faults and their propagation in a TPU using the Discrete-Time Markov Chain (DTMC) formalism.
arXiv Detail & Related papers (2020-06-05T18:11:14Z) - A Survey on Impact of Transient Faults on BNN Inference Accelerators [0.9667631210393929]
Big data booming enables us to easily access and analyze the highly large data sets.
Deep learning models require significant computation power and extremely high memory accesses.
In this study, we demonstrate that the impact of soft errors on a customized deep learning algorithm might cause drastic image misclassification.
arXiv Detail & Related papers (2020-04-10T16:15:55Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.