FT-CNN: Algorithm-Based Fault Tolerance for Convolutional Neural
Networks
- URL: http://arxiv.org/abs/2003.12203v4
- Date: Mon, 7 Sep 2020 20:34:16 GMT
- Title: FT-CNN: Algorithm-Based Fault Tolerance for Convolutional Neural
Networks
- Authors: Kai Zhao, Sheng Di, Sihuan Li, Xin Liang, Yujia Zhai, Jieyang Chen,
Kaiming Ouyang, Franck Cappello, Zizhong Chen
- Abstract summary: Convolutional neural networks (CNNs) are becoming more and more important for solving challenging and critical problems in many fields.
CNN inference applications have been deployed in safety-critical systems, which may suffer from soft errors caused by high-energy particles, high temperature, or abnormal voltage.
Traditional fault tolerance methods are not suitable for CNN inference because error-correcting code is unable to protect computational components.
- Score: 13.100954947774163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks (CNNs) are becoming more and more important for
solving challenging and critical problems in many fields. CNN inference
applications have been deployed in safety-critical systems, which may suffer
from soft errors caused by high-energy particles, high temperature, or abnormal
voltage. Of critical importance is ensuring the stability of the CNN inference
process against soft errors. Traditional fault tolerance methods are not
suitable for CNN inference because error-correcting code is unable to protect
computational components, instruction duplication techniques incur high
overhead, and existing algorithm-based fault tolerance (ABFT) techniques cannot
protect all convolution implementations. In this paper, we focus on how to
protect the CNN inference process against soft errors as efficiently as
possible, with the following three contributions. (1) We propose several
systematic ABFT schemes based on checksum techniques and analyze their fault
protection ability and runtime thoroughly.Unlike traditional ABFT based on
matrix-matrix multiplication, our schemes support any convolution
implementations. (2) We design a novel workflow integrating all the proposed
schemes to obtain a high detection/correction ability with limited total
runtime overhead. (3) We perform our evaluation using ImageNet with well-known
CNN models including AlexNet, VGG-19, ResNet-18, and YOLOv2. Experimental
results demonstrate that our implementation can handle soft errors with very
limited runtime overhead (4%~8% in both error-free and error-injected
situations).
Related papers
- Cost-Effective Fault Tolerance for CNNs Using Parameter Vulnerability Based Hardening and Pruning [0.4660328753262075]
This paper introduces a model-level hardening approach for CNNs by integrating error correction directly into the neural networks.
The proposed method demonstrates fault resilience nearly equivalent to TMR-based correction but with significantly reduced overhead.
Remarkably, the hardened pruned CNNs perform up to 24% faster than the hardened un-pruned ones.
arXiv Detail & Related papers (2024-05-17T09:42:44Z) - Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine.
These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults.
We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z) - Performance and accuracy assessments of an incompressible fluid solver
coupled with a deep Convolutional Neural Network [0.0]
The resolution of the Poisson equation is usually one of the most computationally intensive steps for incompressible fluid solvers.
CNN has been introduced to solve this equation, leading to significant inference time reduction.
A hybrid strategy is developed, which couples a CNN with a traditional iterative solver to ensure a user-defined accuracy level.
arXiv Detail & Related papers (2021-09-20T08:30:29Z) - Neural Architecture Dilation for Adversarial Robustness [56.18555072877193]
A shortcoming of convolutional neural networks is that they are vulnerable to adversarial attacks.
This paper aims to improve the adversarial robustness of the backbone CNNs that have a satisfactory accuracy.
Under a minimal computational overhead, a dilation architecture is expected to be friendly with the standard performance of the backbone CNN.
arXiv Detail & Related papers (2021-08-16T03:58:00Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural
Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF)
The proposed SIDNN is compatible with a broad range of OPF schemes.
It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Fault Injectors for TensorFlow: Evaluation of the Impact of Random
Hardware Faults on Deep CNNs [4.854070123523902]
We introduce two new Fault Injection (FI) frameworks for evaluating how Deep Learning (DL) components operate under the presence of random faults.
In this paper, we present the results of FI experiments conducted on four VGG-based Convolutional NNs using two image sets.
Results help to identify the most critical operations and layers, compare the reliability characteristics of functionally similar NNs, and introduce selective fault tolerance mechanisms.
arXiv Detail & Related papers (2020-12-13T11:16:25Z) - FATNN: Fast and Accurate Ternary Neural Networks [89.07796377047619]
Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.
In this work, we show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2.
We elaborately design an implementation-dependent ternary quantization algorithm to mitigate the performance gap.
arXiv Detail & Related papers (2020-08-12T04:26:18Z) - Making Convolutions Resilient via Algorithm-Based Error Detection
Techniques [2.696566807900575]
Convolutional Neural Networks (CNNs) accurately process real-time telemetry.
CNNs must execute correctly in the presence of hardware faults.
Full duplication provides the needed assurance but incurs a 100% overhead.
arXiv Detail & Related papers (2020-06-08T23:17:57Z) - HarDNN: Feature Map Vulnerability Evaluation in CNNs [23.24111155295923]
This paper presents HarDNN, a software-directed approach to identify vulnerable computations during a CNN inference.
We show that HarDNN can accurately estimate relative vulnerability of a feature map (fmap) in CNNs using a statistical error injection campaign.
Results show that the improvement in resilience for the added computation is superlinear with HarDNN.
arXiv Detail & Related papers (2020-02-22T23:05:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.