Related papers: Algorithmic Strategies for Sustainable Reuse of Neural Network Accelerators with Permanent Faults

Algorithmic Strategies for Sustainable Reuse of Neural Network Accelerators with Permanent Faults

URL: http://arxiv.org/abs/2412.16208v1
Date: Tue, 17 Dec 2024 18:56:09 GMT
Title: Algorithmic Strategies for Sustainable Reuse of Neural Network Accelerators with Permanent Faults
Authors: Youssef A. Ait Alama, Sampada Sakpal, Ke Wang, Razvan Bunescu, Avinash Karanth, Ahmed Louri,
Abstract summary: We propose novel approaches that quantify permanent hardware faults in neural network (NN) accelerators by uniquely integrating the behavior of the faulty component instead of bypassing it.<n>We propose several algorithmic mitigation techniques for a subset of stuck-at faults, such as Invertible Scaling or Shifting of activations and weights, or fine tuning with the faulty behavior.<n> Notably, the proposed techniques do not require any hardware modification, instead relying on existing components of widely used systolic array based accelerators.
Score: 9.89051364546275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hardware failures are a growing challenge for machine learning accelerators, many of which are based on systolic arrays. When a permanent hardware failure occurs in a systolic array, existing solutions include localizing and isolating the faulty processing element (PE), using a redundant PE for re-execution, or in some extreme cases decommissioning the entire accelerator for further investigation. In this paper, we propose novel algorithmic approaches that mitigate permanent hardware faults in neural network (NN) accelerators by uniquely integrating the behavior of the faulty component instead of bypassing it. In doing so, we aim for a more sustainable use of the accelerator where faulty hardware is neither bypassed nor discarded, instead being given a second life. We first introduce a CUDA-accelerated systolic array simulator in PyTorch, which enabled us to quantify the impact of permanent faults appearing on links connecting two PEs or in weight registers, where one bit is stuck at 0 or 1 in the float32, float16, or bfloat16 representation. We then propose several algorithmic mitigation techniques for a subset of stuck-at faults, such as Invertible Scaling or Shifting of activations and weights, or fine tuning with the faulty behavior. Notably, the proposed techniques do not require any hardware modification, instead relying on existing components of widely used systolic array based accelerators, such as normalization, activation, and storage units. Extensive experimental evaluations using fully connected and convolutional NNs trained on MNIST, CIFAR-10 and ImageNet show that the proposed fault-tolerant approach matches or gets very close to the original fault-free accuracy.

Related papers

Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z)
Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders. We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z)
TSB: Tiny Shared Block for Efficient DNN Deployment on NVCIM Accelerators [11.496631244103773]
"Tiny Shared Block (TSB)" integrates a small shared 1x1 convolution block into the Deep Neural Network architecture. TSB achieves over 20x inference accuracy gap improvement, over 5x training speedup, and weights-to-device mapping cost reduction.
arXiv Detail & Related papers (2024-05-08T20:53:38Z)
Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure. First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module. The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z)
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z)
FPGA Resource-aware Structured Pruning for Real-Time Neural Networks [3.294652922898631]
Pruning sparsifies a neural network, reducing the number of multiplications and memory. We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures. Proposed method achieves reductions ranging between 55% and 92% in the DSP utilization and up to 81% in BRAM utilization.
arXiv Detail & Related papers (2023-08-09T18:14:54Z)
eFAT: Improving the Effectiveness of Fault-Aware Training for Mitigating Permanent Faults in DNN Hardware Accelerators [15.344503991760275]
Fault-Aware Training (FAT) has emerged as a highly effective technique for addressing permanent faults in DNN accelerators. FAT is required to be performed for each faulty chip individually, considering its unique fault map. We propose concepts of resilience-driven retraining amount selection, and resilience-driven grouping and fusion of multiple fault maps.
arXiv Detail & Related papers (2023-04-20T01:35:11Z)
Towards Dynamic Fault Tolerance for Hardware-Implemented Artificial Neural Networks: A Deep Learning Approach [0.0]
This work investigates a deep learning approach to mitigate dynamic fault impact for artificial neural networks. As a theoretic use case, image compression by means of a deep autoencoder is considered.
arXiv Detail & Related papers (2022-10-16T18:09:48Z)
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks. The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources. This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z)
Hardware-Robust In-RRAM-Computing for Object Detection [0.15113576014047125]
In-RRAM computing suffered from large device variation and numerous nonideal effects in hardware. This paper proposes a joint hardware and software optimization strategy to design a hardware-robust IRC macro for object detection. The proposed approach has been successfully applied to a complex object detection task with only 3.85% mAP drop.
arXiv Detail & Related papers (2022-05-09T01:46:24Z)
Real-Time GPU-Accelerated Machine Learning Based Multiuser Detection for 5G and Beyond [70.81551587109833]
nonlinear beamforming filters can significantly outperform linear approaches in stationary scenarios with massive connectivity. One of the main challenges comes from the real-time implementation of these algorithms. This paper explores the acceleration of APSM-based algorithms through massive parallelization.
arXiv Detail & Related papers (2022-01-13T15:20:45Z)
ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF) ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.