Related papers: ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures

ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures

URL: http://arxiv.org/abs/2310.03841v2
Date: Mon, 5 Feb 2024 20:57:06 GMT
Title: ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures
Authors: Haoxuan Liu, Vasu Singh, Michał Filipiuk, Siva Kumar Sastry Hari,
Abstract summary: Vision Transformers are being increasingly deployed in safety-critical applications that demand high reliability. It is crucial to ensure the correctness of their execution in spite of potential errors such as transient hardware errors. We propose an algorithm-based resilience framework called ALBERTA that allows us to perform end-to-end resilience analysis.
Score: 5.502117675161604
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision Transformers are being increasingly deployed in safety-critical applications that demand high reliability. It is crucial to ensure the correctness of their execution in spite of potential errors such as transient hardware errors. We propose a novel algorithm-based resilience framework called ALBERTA that allows us to perform end-to-end resilience analysis and protection of transformer-based architectures. First, our work develops an efficient process of computing and ranking the resilience of transformers layers. We find that due to the large size of transformer models, applying traditional network redundancy to a subset of the most vulnerable layers provides high error coverage albeit with impractically high overhead. We address this shortcoming by providing a software-directed, checksum-based error detection technique aimed at protecting the most vulnerable general matrix multiply (GEMM) layers in the transformer models that use either floating-point or integer arithmetic. Results show that our approach achieves over 99% coverage for errors that result in a mismatch with less than 0.2% and 0.01% computation and memory overheads, respectively. Lastly, we present the applicability of our framework in various modern GPU architectures under different numerical precisions. We introduce an efficient self-correction mechanism for resolving erroneous detection with an average of less than 2% overhead per error.

Related papers

FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention [5.044679241062448]
Transformer models leverage self-attention mechanisms to capture dependencies, demonstrating exceptional performance in various applications. Existing fault tolerance methods protect each operation separately using decoupled kernels, incurring substantial computational and memory overhead. We propose a novel error-resilient framework for Transformer models, integrating end-to-end fault tolerant attention.
arXiv Detail & Related papers (2025-04-03T02:05:08Z)
ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity. This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics. Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z)
Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders. We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z)
Efficient Fault Detection Architectures for Modular Exponentiation Targeting Cryptographic Applications Benchmarked on FPGAs [2.156170153103442]
We propose a lightweight fault detection architecture tailored for modular exponentiation. Our approach achieves an error detection rate close to 100%, all while introducing a modest computational overhead of approximately 7%.
arXiv Detail & Related papers (2024-02-28T04:02:41Z)
Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM [27.354689865791638]
Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs) Hardware failures, such as stuck-at-fault defects, can result in significant prediction errors during model inference. We propose a fault protection mechanism that incurs zero space cost.
arXiv Detail & Related papers (2024-01-22T02:50:38Z)
Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications. The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate. There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z)
ApproxABFT: Approximate Algorithm-Based Fault Tolerance for Neural Network Processing [7.578258600530223]
Algorithm-based fault tolerance (ABFT) mechanisms have become a promising solution for reliability enhancement. We propose an Approximate ABFT framework that introduces adaptive error tolerance thresholds to enable selective fault recovery. The proposed ApproxABFT achieves a 43.39% average reduction in redundant computing overhead compared to previous accurate ABFT.
arXiv Detail & Related papers (2023-02-21T06:21:28Z)
Soft Error Reliability Analysis of Vision Transformers [14.132398744731635]
Vision Transformers (ViTs) that leverage self-attention mechanism have shown superior performance on many classical vision tasks. Existing ViTs works mainly optimize performance and accuracy, but ViTs reliability issues induced by soft errors have generally been overlooked. In this work, we study the reliability of ViTs and investigate the vulnerability from different architecture granularities.
arXiv Detail & Related papers (2023-02-21T06:17:40Z)
HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions. We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z)
Neural Architecture Search on Efficient Transformers and Beyond [23.118556295894376]
We propose a new framework to find optimal architectures for efficient Transformers with the neural architecture search (NAS) technique. We observe that the optimal architecture of the efficient Transformer has the reduced computation compared with that of the standard Transformer. Our searched architecture maintains comparable accuracy to the standard Transformer with notably improved computational efficiency.
arXiv Detail & Related papers (2022-07-28T08:41:41Z)
Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine. These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults. We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z)
Semantic Perturbations with Normalizing Flows for Improved Generalization [62.998818375912506]
We show that perturbations in the latent space can be used to define fully unsupervised data augmentations. We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective.
arXiv Detail & Related papers (2021-08-18T03:20:00Z)
Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models. We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z)
Making Convolutions Resilient via Algorithm-Based Error Detection Techniques [2.696566807900575]
Convolutional Neural Networks (CNNs) accurately process real-time telemetry. CNNs must execute correctly in the presence of hardware faults. Full duplication provides the needed assurance but incurs a 100% overhead.
arXiv Detail & Related papers (2020-06-08T23:17:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.