Related papers: FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention

FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention

URL: http://arxiv.org/abs/2504.02211v1
Date: Thu, 03 Apr 2025 02:05:08 GMT
Title: FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention
Authors: Huangliang Dai, Shixun Wu, Hairui Zhao, Jiajun Huang, Zizhe Jian, Yue Zhu, Haiyang Hu, Zizhong Chen,
Abstract summary: Transformer models leverage self-attention mechanisms to capture dependencies, demonstrating exceptional performance in various applications.<n>Existing fault tolerance methods protect each operation separately using decoupled kernels, incurring substantial computational and memory overhead.<n>We propose a novel error-resilient framework for Transformer models, integrating end-to-end fault tolerant attention.
Score: 5.044679241062448
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer models leverage self-attention mechanisms to capture complex dependencies, demonstrating exceptional performance in various applications. However, the long-duration high-load computations required for model inference impose stringent reliability demands on the computing platform, as soft errors that occur during execution can significantly degrade model performance. Existing fault tolerance methods protect each operation separately using decoupled kernels, incurring substantial computational and memory overhead. In this paper, we propose a novel error-resilient framework for Transformer models, integrating end-to-end fault tolerant attention (EFTA) to improve inference reliability against soft errors. Our approach enables error detection and correction within a fully fused attention kernel, reducing redundant data access and thereby mitigating memory faults. To further enhance error coverage and reduce overhead, we design a hybrid fault tolerance scheme tailored for the EFTA, introducing for the first time: 1) architecture-aware algorithm-based fault tolerance (ABFT) using tensor checksum, which minimizes inter-thread communication overhead on tensor cores during error detection; 2) selective neuron value restriction, which selectively applies adaptive fault tolerance constraints to neuron values, balancing error coverage and overhead; 3) unified verification, reusing checksums to streamline multiple computation steps into a single verification process. Experimental results show that EFTA achieves up to 7.56x speedup over traditional methods with an average fault tolerance overhead of 13.9%.

Related papers

Cost-Effective Fault Tolerance for CNNs Using Parameter Vulnerability Based Hardening and Pruning [0.4660328753262075]
This paper introduces a model-level hardening approach for CNNs by integrating error correction directly into the neural networks. The proposed method demonstrates fault resilience nearly equivalent to TMR-based correction but with significantly reduced overhead. Remarkably, the hardened pruned CNNs perform up to 24% faster than the hardened un-pruned ones.
arXiv Detail & Related papers (2024-05-17T09:42:44Z)
Parameter-tuning-free data entry error unlearning with adaptive selective synaptic dampening [51.34904967046097]
We introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning. We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD) on various ResNet18 and Vision Transformer unlearning tasks. The application of this approach is particularly compelling in industrial settings, such as supply chain management.
arXiv Detail & Related papers (2024-02-06T14:04:31Z)
Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp) We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings. For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error. In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z)
ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures [5.502117675161604]
Vision Transformers are being increasingly deployed in safety-critical applications that demand high reliability. It is crucial to ensure the correctness of their execution in spite of potential errors such as transient hardware errors. We propose an algorithm-based resilience framework called ALBERTA that allows us to perform end-to-end resilience analysis.
arXiv Detail & Related papers (2023-10-05T18:55:30Z)
Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error. We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z)
ApproxABFT: Approximate Algorithm-Based Fault Tolerance for Neural Network Processing [7.578258600530223]
We propose ApproxABFT, which initiates error recovery only when computational errors are significant.<n>This approach avoids unnecessary recovery procedures, streamlines the error recovery process, and focuses on correcting impactful errors.<n> Experimental results demonstrate that ApproxABFT reduces the computing overhead by 67.83% and improves the tolerable bit error rate by an order of magnitude on average.
arXiv Detail & Related papers (2023-02-21T06:21:28Z)
DeepFT: Fault-Tolerant Edge Computing using a Self-Supervised Deep Surrogate Model [12.335763358698564]
We propose DeepFT to proactively avoid system overloads and their adverse effects. DeepFT uses a deep surrogate model to accurately predict and diagnose faults in the system. It offers a highly scalable solution as the model size scales by only 3 and 1 percent per unit increase in the number of active tasks and hosts.
arXiv Detail & Related papers (2022-12-02T16:51:58Z)
Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine. These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults. We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z)
Truncated tensor Schatten p-norm based approach for spatiotemporal traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers. Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z)
Fault-tolerant parity readout on a shuttling-based trapped-ion quantum computer [64.47265213752996]
We experimentally demonstrate a fault-tolerant weight-4 parity check measurement scheme. We achieve a flag-conditioned parity measurement single-shot fidelity of 93.2(2)%. The scheme is an essential building block in a broad class of stabilizer quantum error correction protocols.
arXiv Detail & Related papers (2021-07-13T20:08:04Z)
FT-CNN: Algorithm-Based Fault Tolerance for Convolutional Neural Networks [13.100954947774163]
Convolutional neural networks (CNNs) are becoming more and more important for solving challenging and critical problems in many fields. CNN inference applications have been deployed in safety-critical systems, which may suffer from soft errors caused by high-energy particles, high temperature, or abnormal voltage. Traditional fault tolerance methods are not suitable for CNN inference because error-correcting code is unable to protect computational components.
arXiv Detail & Related papers (2020-03-27T02:01:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.