ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures
- URL: http://arxiv.org/abs/2310.03841v2
- Date: Mon, 5 Feb 2024 20:57:06 GMT
- Title: ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures
- Authors: Haoxuan Liu, Vasu Singh, MichaĆ Filipiuk, Siva Kumar Sastry Hari,
- Abstract summary: Vision Transformers are being increasingly deployed in safety-critical applications that demand high reliability.
It is crucial to ensure the correctness of their execution in spite of potential errors such as transient hardware errors.
We propose an algorithm-based resilience framework called ALBERTA that allows us to perform end-to-end resilience analysis.
- Score: 5.502117675161604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision Transformers are being increasingly deployed in safety-critical applications that demand high reliability. It is crucial to ensure the correctness of their execution in spite of potential errors such as transient hardware errors. We propose a novel algorithm-based resilience framework called ALBERTA that allows us to perform end-to-end resilience analysis and protection of transformer-based architectures. First, our work develops an efficient process of computing and ranking the resilience of transformers layers. We find that due to the large size of transformer models, applying traditional network redundancy to a subset of the most vulnerable layers provides high error coverage albeit with impractically high overhead. We address this shortcoming by providing a software-directed, checksum-based error detection technique aimed at protecting the most vulnerable general matrix multiply (GEMM) layers in the transformer models that use either floating-point or integer arithmetic. Results show that our approach achieves over 99% coverage for errors that result in a mismatch with less than 0.2% and 0.01% computation and memory overheads, respectively. Lastly, we present the applicability of our framework in various modern GPU architectures under different numerical precisions. We introduce an efficient self-correction mechanism for resolving erroneous detection with an average of less than 2% overhead per error.
Related papers
- Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - Efficient Fault Detection Architectures for Modular Exponentiation Targeting Cryptographic Applications Benchmarked on FPGAs [2.156170153103442]
We propose a lightweight fault detection architecture tailored for modular exponentiation.
Our approach achieves an error detection rate close to 100%, all while introducing a modest computational overhead of approximately 7%.
arXiv Detail & Related papers (2024-02-28T04:02:41Z) - Zero-Space Cost Fault Tolerance for Transformer-based Language Models on
ReRAM [27.354689865791638]
Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs)
Hardware failures, such as stuck-at-fault defects, can result in significant prediction errors during model inference.
We propose a fault protection mechanism that incurs zero space cost.
arXiv Detail & Related papers (2024-01-22T02:50:38Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - Soft Error Reliability Analysis of Vision Transformers [14.132398744731635]
Vision Transformers (ViTs) that leverage self-attention mechanism have shown superior performance on many classical vision tasks.
Existing ViTs works mainly optimize performance and accuracy, but ViTs reliability issues induced by soft errors have generally been overlooked.
In this work, we study the reliability of ViTs and investigate the vulnerability from different architecture granularities.
arXiv Detail & Related papers (2023-02-21T06:17:40Z) - HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer
Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions.
We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z) - Neural Architecture Search on Efficient Transformers and Beyond [23.118556295894376]
We propose a new framework to find optimal architectures for efficient Transformers with the neural architecture search (NAS) technique.
We observe that the optimal architecture of the efficient Transformer has the reduced computation compared with that of the standard Transformer.
Our searched architecture maintains comparable accuracy to the standard Transformer with notably improved computational efficiency.
arXiv Detail & Related papers (2022-07-28T08:41:41Z) - Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine.
These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults.
We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z) - Semantic Perturbations with Normalizing Flows for Improved
Generalization [62.998818375912506]
We show that perturbations in the latent space can be used to define fully unsupervised data augmentations.
We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective.
arXiv Detail & Related papers (2021-08-18T03:20:00Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z) - Making Convolutions Resilient via Algorithm-Based Error Detection
Techniques [2.696566807900575]
Convolutional Neural Networks (CNNs) accurately process real-time telemetry.
CNNs must execute correctly in the presence of hardware faults.
Full duplication provides the needed assurance but incurs a 100% overhead.
arXiv Detail & Related papers (2020-06-08T23:17:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.